We [1] developed IoT Inspector, an open-source tool that allows the owners' of smart home devices to monitor those devices' network traffic and discover potential security and privacy risks. In this article, we discuss some of what we discovered but also the problems we are facing in collecting reliable and accurate data.
Since the COVID-19 pandemic, new trends have emerged in how people interact with their homes. To many adults, the home is both a place to rest and a place to work. To many children who cannot attend schools, the home is their new learning environment. Also, to individuals with underlying health issues, the home is where they seek care and refuge.
Along with this shift in trend is the increasing adoption of smart home technologies, also known as Internet-of-Things: for example, smart TVs and speakers for entertainment; smart toys for children to play; smart health devices that require at-home healthcare monitoring and intervention.
Despite these concerns, researchers who study smart home security and privacy have to deal with a major challenge: smart home devices are physical objects, and there is a large variety in terms of the types and manufacturers of devices. It is difficult to automate the setup and analysis of these physical devices at scale, unlike studies of mobile apps. As such, many studies on smart home security and privacy are limited to a small subset of devices in the lab. One way to scale up is to scan the IPv4 space of the Internet, but the results are restricted to Internet-exposed devices, while overlooking devices on private home networks.
When we faced this challenge back in 2019, we asked ourselves: “Can we ask real users to run experiments for us, since they are the ones with the large variety of smart home devices?” We thought of paying participants, but we would like to avoid being limited by our budget. We would like a way to let users willingly help us run experiments, because they themselves would gain some benefits.
Figure 1 shows an example of IoT Inspector's main dashboard, provided by a member of the research team in their own smart home. It shows the network traffic of various smart devices in the past 20 minutes. Users can also view the network activities of individual devices. For example, Ira Flatow, a reporter with National Public Radio, independently used IoT Inspector to analyze the network traffic of their Roku TV. As shown in Figure 2, Flatow shared a screenshot of IoT Inspector in action, which shows that the Roku TV contacted a number of advertising services, including Scorecard Research and Alphabet (DoubleClick).
We launched IoT Inspector in April 2019. Since then, IoT Inspector has collected the network traffic from more than 63,000 devices. This dataset includes traffic metadata, such as the remote IP addresses, hostnames, and ports, aggregated over 5-second windows. The dataset also includes names and manufacturers of a subset of the devices. Interested readers can view a sample dataset at this link.
A common question that our users ask is: “What is this device on my network?” When a user runs IoT Inspector for the first time, IoT Inspector shows a list of devices on the network. The user can choose one or multiple devices to "inspect", have the traffic captured, and view the analysis.
We currently use these features to infer device identities, although not all devices can be identified by IoT Inspector:
- MAC OUI. The first 3 bytes of a device's MAC address. It shows the company that manufactured the wifi chip, rather than the device. This generally works well for, say, Amazon devices, but not others. For example, we would often see the name “Espressif,”which is a popular manufacturer of IoT boards behind many brands. This information is not helpful for device identification.
- DHCP hostname. A device may announce its hostname as it obtains an IP address via DHCP. For example, some smart door locks announce themselves this way. The problem is that few devices announce their hostnames via DHCP.
- HTTP user agent. The user agent string sometimes shows what device it is. For example, a Samsung TV's HTTP user agent string would include the term “Tizen.” The problem is that the HTTP traffic must be in plaintext for IoT Inspector to see the user agent string for device identification; the widespread adoption of HTTPS makes this process difficult. Also, there are not so useful cases, like when the user agent is simply "curl",
- mDNS and UPnP announcements. Again, they are useful in identifying devices—just like DHCP hostnames—but the problem is that not all devices support mDNS and UPnP.
- Hostnames contacted by the smart device. They are useful when the hostnames can uniquely identify the device; for example, roku.com is typically contacted by Roku TVs. However, popular infrastructure providers, such as AWS, are not useful in device identification.
In addition to inferring device identities, IoT Inspector also asks users to label their devices with the device name and manufacturer. Effectively, we're crowdsourcing device identities from users, but the user labels can be noisy.
There are three problems with users' manual labels: missing labels, inconsistent labels, and wrong labels.
- Missing labels. Slightly less than half of our users labeled at least one of their devices, telling us the names and manufacturers of devices. Of all the devices, only 25% have user labels.
- Inconsistent labels. Users currently label their devices through a dropdown list (Figure 3). We cannot possibly list every single device name there, so we let users enter free text too. Although the free text gives users the flexibility of labeling devices that we do not already know, free text gives us inconsistent labels. For example, a user could label an Amazon Echo as “Amazon Echo” or “Amazon Alexa.” Both are equivalent, but we would have to train our classifier to know that. This is just one of many examples of different ways to label the same device.
- Wrong labels: We have seen what was labeled as a "smart fan" was communicating with some Android domains and hundreds of advertising services. As we checked the smart fan’s official website, the fan does not seem to run Android. It is likely that the device was labeled incorrectly.
In short, we can gather a large dataset from real-world users, but we need to improve the label quantity and quality.
Many of our users ask: What is my device doing? Recall that Figure 2 shows the network activities of a Roku TV, where each color corresponds to a third-party service contacted by the TV. We obtain these names from the DNS or Server Name Indication (SNI) from TLS ClientHellos in some cases. But what if certain DNS packets are missing or cached, or what if ClientHello messages are missing? When that happens, IoT Inspector does not know what company a smart device is talking to.
Even if IoT Inspector knows the remote hostname, the user may not know what the remote hostname means in terms of the device's activity. For example, many Belkin Wemo smart plugs communicate with “api.xbcs.net.” What's xbcs.net? It does not bear the name of the company, Wemo. If you visit xbcs.net, there's no web server. Basically, it is hard to tell what xbcs is, whether it is related to Belkin, or what the device is doing.
Also, the truth could be spooky to users. Here is a real story. A user emailed us and asked: My device is communicating with a military domain; is the military spying on me? The third-party service that their device contacted was “tock.usno.navy.mil”, which is in fact an NTP time server. There are thousands of time servers in the world. It just so happens that the person's device is using this particular time server operated by the Navy.
In general, it is tricky to communicate device activities to users. Information could be missing. If it is not missing, we need to be careful not to spook users if they do not have a strong technical background.
How do we convince more users to use and keep using the IoT using IoT Inspector? This is a crowdsourced study. We need more users and more user engagement. Since we do not pay them, we need to build a better product for them.
Currently, we do not have a large number of active users, and many of our users have not labeled their devices. As of June 2022, our users have collectively scanned for more than 200,000 devices. Users inspected about a third of these devices—meaning that they had IoT Inspector capture and analyze the device traffic. These devices correspond to the 6,400+ users, but the median duration of running IoT Inspector is about 40 minutes. Only about a quarter of the devices were labeled, and these correspond to about 2,900+ users.
So there is room for improvement—for instance, longer duration beyond 40 minutes. Can we encourage users to run IoT Inspector for days? Can we get more users to inspect more devices and label more devices?
Since we are not paying the users, we can potentially attract more users and keep them engaged longer by building a better product—one with a better user interface and user experience, for example, by offering more usable information about what their devices are doing behind the scenes. We are currently partnering with Consumer Reports to polish the user experience with a team of professional UI/UX developers. Furthermore, we can also increase the duration of use by deploying IoT Inspector on Raspberry Pi.
To find out more about what our users want, we conducted a number of focus groups with users. One common theme is that users want not just more information about their smart home security, but also how they can take actions. Can we let users block devices and certain connections? For example, a user could have an indoor camera. Can IoT Inspector block the camera when the user is at home but unblock the camera when the user is out? These are some of the work-in-progress for the upcoming IoT Inspector version.
The past three years have been eye-opening for us. We gathered a large traffic dataset of devices from a large number of users around the world. We faced a lot of difficulties as both operators of IoT Inspector and researchers using the dataset. As we work on the next major release, we welcome readers to experiment with the code, share feedback, and even collaborate with us.
Just as ImageNet is a useful tool for computer vision researchers, we want to make IoT Inspector a useful platform for smart home researchers—not only in security and privacy, but also in other fields such as networking and machine learning—for years to come.