|
Security 2002 Paper   
[Security '02 Tech Program Index]
VeriSign CZAG: Privacy Leak in X.509 Certificates
Scott G. Renfro
|
Code | Country | Code | Country |
AU | Australia | JP | Japan |
AT | Austria | MX | Mexico |
BE | Belgium | NL | Netherlands |
BR | Brazil | NO | Norway |
CA | Canada | ZA | South Africa |
CN | China | ES | Spain |
DK | Denmark | SE | Sweden |
FI | Finland | CH | Switzerland |
FR | France | TW | Taiwan |
DE | Germany | GB | United Kingdom |
IN | India | US | United States |
IL | Israel | UU | Other Countries |
IT | Italy |
Similarly, the zip code is represented as the ten ASCII characters entered by the subscriber. The contents are left-padded with spaces (i.e. when the user enters less than ten characters during registration, spaces are inserted from the left until the string totals ten characters). The ten characters are located in bytes 39 to 48 of the plaintext.
The date of birth entered by the subscriber is represented as six characters from bytes 50 to 55 of the plaintext. The characters are formatted MMDDYY so that February 01, 1970 appears as 020170.
Finally, gender is simply one character - either M for Male or F for Female. This character is byte 57 in the plaintext. We didn't find any certificates where this byte decoded to some other value of gender.
Using the preceding information, sample certificates were retrieved from VeriSign's public directory server at ldap://directory.verisign.com6 and analyzed to verify the decryption and decoding algorithms, while collecting some broad statistics.
Of the certificates examined, 77% included the CZAG extension. Summarized demographics of those certificates are listed in Table 2. These statistics are based on a non-random sample of 16,285 certificates after removing those with either missing data, age less than 10 years, or age greater than 80 years. Certificates with ages greater than 80 and less than ten nearly always resulted from intentional or inadvertent data entry errors (e.g. listing the current year in the subscriber's date of birth).
Category | Male | Female | Total | |||
Age 10 - 18 | 162 | ( 1%) | 23 | ( < 1%) | 185 | ( 1%) |
Age 18 - 24 | 785 | ( 5%) | 156 | (1%) | 941 | ( 6%) |
Age 25 - 34 | 2,992 | (18%) | 577 | (4%) | 3,569 | (22%) |
Age 35 - 45 | 4,065 | (25%) | 746 | (5%) | 4,811 | (30%) |
Age 45 - 55 | 3,796 | (23%) | 648 | (4%) | 4,444 | (27%) |
Age 55 - 65 | 1,569 | (10%) | 204 | (1%) | 1,773 | (11%) |
Age 65 - 80 | 495 | ( 3%) | 67 | ( < 1%) | 5,62 | ( 3%) |
Total | 13,864 | (85%) | 2,421 | (15%) | 16,285 | (100%) |
We also estimated the total number of certificates issued based on numbers VeriSign has made public directly or through news reports. Prior to the CZAG extension being introduced, VeriSign had issued more than 750,000 consumer end-entity certificates (excluding SSL server certificates) [14]. VeriSign passed the 3 million client certificate mark sometime between October, 1998 [5] and June, 1999 [4], not including OnSite certificates for corporate users. This allows us to conservatively estimate that VeriSign issued more than 5 million client certificates during the time that CZAG extensions were being embedded. Given the 77% inclusion rate, we deduce that more than 3 million certificates issued between 1997 and 2002 may have included the CZAG extension.
From the preceding discussion and our examination of thousands of certificates over several years, it is clear that only a single encryption key was used to encrypt this data. As a result, VeriSign had no way to revoke access to the data by a trusted third party whose agreement had either expired or been terminated.
Although the initial press releases [15] claimed that a third party who violated the privacy pledge would have their license revoked, there is no obvious technical mechanism to enforce this action. The CZAG extension was simply to small to contain separately encrypted message encryption keys for each trusted third party. As a result, VeriSign would have had to revoke access to all parties by changing the symmetric encryption key, only distributing the new key to those parties that were still trusted.
Finally, such an approach didn't prevent these now-untrusted third parties from accessing the data within older certificates, only within the certificates encrypted with the new key. Since the certificates are issued with a one-year validity period, it may take up to one year for these old certificates to fall out of circulation.
In addition to technical issues, the VeriSign example illustrates a common social/policy issue: there was a clear discrepancy between user expectations based on marketing literature and VeriSign's actual commitment and implementation.
Subscribers, including a few industry experts to whom we sent birthday greetings during our original research, incorrectly believed that their information was only available to web sites licensed by VeriSign. Because the information was encrypted, it was not visibly exposed when inspecting the certificate in a web browser or with ASN.1 parsers.
Additionally, most of the public documentation implies limited distribution and availability of the user's demographics when provided - even to the technically knowledgeable reader. The registration page indicates the data is ``presented to participating web sites''[19], while a press release[16] claims ``consumers have complete control of ... whether or not to present it at sites they visit.'' Finally, VeriSign's original announcement asserted that participating sites must adhere to a consumer privacy pledge[16] and sites which violate the provisions of the pledge will have their reader license revoked by VeriSign[15].
Although much of this documentation implies limited distribution and strong protection, VeriSign never explicitly asserted such protection. In fact, the word ``encryption'' was never used in conjunction with the feature and their privacy statement points out that ``VeriSign's CPS requires VeriSign to publish all subscriber certificates within the Public Certification Services. Consequently, a subscriber should have no expectation of privacy regarding the content of his or her Digital ID.''[18]
This conflict is common among on-line services. They have a strong economic motivation to build trust among users, yet cannot explicitly misrepresent data handling and disclosure procedures. As a result, most public descriptions euphemistically describe the features positive aspects and gloss over any risks. Finally, the two types of text - legal and marketing - are typically authored by different people with somewhat opposing motivations, increasing the confusion.
This situation has only worsened over the last several years as economic stakes increase and user expertise drops. Many hope that the Platform for Privacy Policies Project (P3P) will yield a useful and constrained language used to communicate these privacy practices to users, and allow users to easily program their user agents to automatically take action based on a site's machine-readable policies[6]. Ultimately, however, users who want to control information on a per-attribute basis may find P3P wanting.
The third interesting characteristic of the VeriSign example has been it's resistance to change. Although it was rolled out more than five years ago with known weaknesses and was never a significant source of revenue, VeriSign was been reluctant to remove the feature from their registration process.
When launched in 1997, VeriSign announced that a handful of companies had already agreed to license the software and use the data in their registration process. These companies hoped to get accurate demographics from users who were registering for services. There was also an expectation that users would be more willing to register at new sites since the information required by the service provider was transfered automatically. Finally, this was a new application for the use of client-side certificates. Ideally, the lure of one password and one-step registration would encourage users to pay VeriSign to get such a convenience at the same time sites were paying for access to the data.
Unfortunately for VeriSign, and much of the PKI industry, the use of client-side SSL certificates never really caught on. There were several problems, including software compatibility, mobility, and lack of motivation. Since so few sites supported the CZAG extension or client authentication SSL, the certificates were relegated to use in secure e-mail applications, and the initial sites dropped support.
In January 2000, when we first examined the CZAG extension, we provided an early predecessor of this paper to VeriSign, and later other members of the information security community. In response, VeriSign said that the problems were old news, the extension had been long forgotten by sites, and concerned users could always opt-out. Although true, thousands of users registering for new certificates each month were still supplying demographics only to have them published in VeriSign's public LDAP directory.
In retrospect, their response was not surprising. As Anderson points out [1], the real driving forces behind information security issues commonly have more to do with perverse economic incentives than technical issues. When a party chooses to leave a potential security problem in place rather than correct it, it may well be the result of their cost-benefit analysis.
In this case, having the feature in place cost VeriSign nothing while removing the feature incurred cost and risk. Since the initial sites had all dropped support, nobody was paying for the privilege of accessing the data. Thus there was nobody to complain that others, who hadn't paid a licensing fee, also had access. Despite any theoretical cost borne by unsuspecting users, it made economic sense to delay removal until there was independent cause to do so.
The VeriSign CZAG extension is just one example of a relatively common problem in X.509 deployments: how can a Certification Authority share sensitive information with some subset of relying parties without unnecessarily disclosing that information to unauthorized parties.
This problem arises in a surprising number of PKI deployments. For example, ``Qualified Certificates'' are designed for use in legally binding contexts and may contain a government-issued unmistakable identifier (e.g., social security or drivers license number)[13]. Yet broad publication of this identifier, bound to the associated subject name, may lower the barrier to identity theft. Other examples include patient identifiers, professional license numbers, and internal corporate usernames, hostnames, or IP addresses. Even embedded authorization data may disclose information that is useful to both attackers and users.
As a result, many system designers struggle with whether to include such information and, if included, how to protect it. In the following sections, we outline goals and design constraints that should be considered and then suggest some directions that may lead to appropriate solutions. We discuss embedding opaque identifiers that point to an online database, protecting the data with various key management schemes, and tools to allow the user to control disclosure.
We don't further consider the obvious case of simply accepting the risk posed by embedding sensitive information in certificates without further protection.
Although we implicitly touched upon many of the goals while discussing the CZAG example, it is useful to state these explicitly. Our overall objective is to make certified information available to, and only to, a mutable subset of relying parties. We call members of this subset ``trusted relying parties.''
The format of X.509 certificates and deployment model characteristics (e.g., on-line vs. off-line systems) combine to impose several design constraints on the problem. These constraints eliminate many of the traditional solutions to similar problems.
Designers must add their own goals and constraints to this baseline. For example, real-time communication with the Certification Authority may not be possible in off-line systems. Some on-line systems may have performance requirements that preclude additional network transactions during certificate validation. Finally, some systems may have legal restrictions imposed on the strength of the protection employed. Most systems being deployed today, however, are database-centric and on-line, performing network transactions during certificate validation (e.g., Online Certificate Status Protocol checks).
The most obvious solution to this problem is to not embed the sensitive data in the public certificate at all. Instead, the Certification Authority stores sensitive information in a centralized database. This information can be indexed by any of several values embedded in the certificate, including an opaque identifier, the subject distinguished name, or the issuer name, serial number pair (which is required to be unique within X.509).
Trusted relying parties can query the database with their own credentials and the index of the information desired. Assuming they are authorized, the information is returned. Their interface to this database may be LDAP, a relational database protocol, or a custom application protocol.
If additional protection against data modification while in storage is desired, the Certification Authority can use issue-time binding. One approach is to append a nonce to the information in the database and including the hash of the information and nonce in the end-entity certificate. The nonce prevents identical information from hashing to the same result. This allows the relying party to verify that the information has not changed since the certificate was generated. On the other hand, if the information needs frequent updates, issue-time binding may not be appropriate.
The primary drawbacks to a database-centric solution are latency and the risk posed by a central database. Additionally, such a scheme is not practical for off-line systems unless they can batch requests for later analysis.
In return, the system can ensure that only authorized parties have access to the database and not expose ciphertexts to untrusted parties. Further, access revocation can be nearly instantaneous and totally independent. In this case, once a party's access has been revoked, they lose access to all data that they have not previously stored within their own systems. Finally, a database can allow finer grained access control and greater flexibility under changing requirements.
When the application requires that information be embedded in certificates (e.g., because it is off-line), better key management schemes can improve the security of this practice.
First, the data should be better protected against cryptanalysis. One
option is to use a block cipher with a random, per-certificate IV
(i.e.,
|
Alternatively, it is possible to encrypt the data with either a stream
or block cipher using a per-certificate key derived from a single
master key combined with unique information in the certificate (e.g.,
the public key or subject distinguished name). In this case,
|
|
Such simple approaches ensure that only those with proper key material can decrypt the data and are sufficient in a closed system where the Certification Authority is the only relying party.7 In all other systems, revocation remains an issue.
When the subset of trusted relying parties is small or only changes infrequently, it may be sufficient to have a single master key which is changed according to some fixed schedule. This allows a tradeoff between administrative overhead and revocation speed.8
In general, administrative overhead is directly related to frequency of key rotation. Worst-case revocation speed can be derived from key lifetime, Tk, certificate lifetime, Tc, and key distribution lead time9, Td. Loss of access to new data begins to take effect no later than Tk +Td after revocation and requires an additional Tc to complete.
For example, assume we generate certificates with one-year validity periods, change keys annually, and distribute keys one month before they are required. This approach causes very little additional overhead (just annual updates), but revocation speed is quite slow, with revoked parties unaffected for up to fourteen months and retaining access to some data for up to twenty-six months after revocation.10 Similarly, when issuing six-month certificates with monthly key rotation and one-month key distribution lead time, revocation begins to take effect after two months and leaves revoked parties completely without access no more than six months later. However, this comes at a substantially greater cost in administrative overhead.
One variation is to trade scheduled key rotation for reactive key rotation that only occurs when there has been a revocation. Each key rotation is typically more expensive since it was unplanned, but there may be fewer overall key changes as a result. In this case, revocation begins to take effect after just Td and requires another Tc to complete. Unfortunately, each revocation of a single party requires all parties to change keys - a tremendous administrative burden.
These approaches cause little message expansion, but significant administrative overhead. Additionally, they leave revoked parties with complete access for a significant period of time or penalize all parties when a single party is revoked. Fortunately, it is possible to reduce administrative overhead while improving revocation speed.
As the subset of trusted relying parties grows or changes more frequently, it may be more efficient and more effective to randomly split the trusted parties into key groups. All parties within a single key group share a common key encryption key.
The data is first encrypted using a randomly-generated,
per-certificate data encryption key and that key is encrypted with
each group's key encryption key. This potentially includes some spare
keys which are not initially distributed to any parties. The resulting
message is composed as
|
When a party's access is revoked, their group's key is removed from use (in future messages their key is used to encrypt a fixed, worthless value rather than the actual data encryption key). The remaining authorized parties within that group are given one of the spare keys or, if no spare keys remain, another group's key. This effectively merges them into a new group.
Compared to simple key rotation, group keys reduce administrative overhead and increase independence of the parties by localizing the effect of a revocation. Additionally, worst-case revocation speed matches the best-case when using key rotation alone. In the worst-case, a revoked party begins to lose access to some data after Td and has lost all access after Tc + Td.
The cost of group key management is primarily borne in message expansion and size constraints limit us to a relatively small number of groups. Generally speaking, the maximum number of group keys, n+s, given a total space of St, a plaintext message of size Sm, and a key of size Sk is n+s = ë(St - Sm) / Skû. For example, a 250 byte message within a 1000 byte space would allow for 46 independent 128-bit keys.
Much more sophisticated approaches are used in the copy protection [11], and broadcast and multicast key management [21,20] fields.
Finally, it may make sense to depart from the X.509 identity certificate approach completely, putting control of the information in the user's hands.
One option, though not widely supported, is to issue X.509 attribute certificates for the user's attributes. These certificates typically bind subject attributes to an entity, rather than an entity to a public key [10]. Used in conjunction with identity certificates, they would allow users to authenticate with their identity certificate and then present the appropriate attribute certificate as necessary. Often, there may be one attribute certificate that contains all attributes. A more flexible approach is to generate one certificate per attribute. This allows the user to choose which attributes to disclose and which to hold private. Ease of use would depend on application support for managing groups of related attributes. X.509, however, expects the user to first authenticate their identity and then authenticate their attributes, but not the latter without the former.
One variation on this concept simply binds attributes to a public key and foregoes the identity concept all together. This allows the user to present the truly required data (e.g., ``Is the presenter authorized?'' or ``Do their attributes meet certain criteria?'') without actually forcing them to reveal their identity. Brands developed an early proposal of this idea in 1993 [2] based on many of the themes in Chaum's work. This same concept is central to the Simple Public Key Infrastructure (SPKI) standards developed by Ellison and others [8].
These approaches give control to the user, who has the economic incentives to exert granular control over the information.
The inclusion of the CZAG extension in subscriber certificates is representative of a more widespread temptation to use X.509v3 certificates as a carrier for many kinds of subject attributes not needed for the actual purpose of the certificate. Although subscribers had the opportunity to opt-out of the feature, most did not. It is unlikely that many of these users realized their personal data was not just available to a few participating web sites, but was published on the Internet where it was readable by anyone given trivial effort.
Organizations planning and deploying both open and closed public key infrastructures are frequently expected to embed sensitive or potentially sensitive information in user certificates. In the face of such requirements, designers must carefully consider the risks when developing an appropriate certificate profile. Failure to do so may allow private data to leak outside the intended scope.
We sincerely appreciate the efforts of everyone who provided feedback on early predecessors of this paper, including Taher Elgamal, Mark Chen, and Mark Schertler. The anonymous reviewers' comments were helpful in developing the paper's final structure. Many thanks, also, to Peter Gutmann who originally recommended polishing up the paper for submission and whose dumpasn1 tool is incredibly handy.
1Worked performed at Securify, Inc.
2We even saw cases where users with pseudonymous e-mail addresses, provided a verifiable full name, date of birth, and zip code (e.g., hackerd00d@example.com).
3DER stands for ASN.1 Data Encoding Rules.
4ASN.1 stands for Abstract Syntax Notation
5There is a very small possibility of a false positive if the sequence appears in a public key, signature, or another extension's payload.
6A user's base-64 encoded certificate can be simply retrieved with the command line: ldapsearch -h directory.verisign.com -b "" mail=<email> usercertificate;binary
7Such systems are surprisingly common amongst closed public key infrastructures.
8Revocation speed is the time required for a revocation event to result in loss of access for the revoked party.
9Key distribution lead time is how long prior to using a particular key we distribute it to trusted parties.
10For example, consider a party who becomes
unauthorized after the following year's key has been distributed but
before it has been put into use (e.g., Dec 15, 2001). A certificate
issued under that key on the last day of the key's use (e.g., Dec 31,
2002) will remain valid and in circulation through Dec 30, 2003 (for
a one-year validity period) and the now-unauthorized party will
have had access on a date for just over two years after their
authorization was revoked.
This paper was originally published in the
Proceedings of the 11th USENIX Security Symposium,
August 59, 2002, San Francisco, CA, USA
Last changed: 19 June 2002 aw |
|