what's the price?
() When Good Services Go Wild: Reassembling Web Services for Unintended Purposes Feng Lu, Jiaqi Zhang, Stefan Savage University of California, San Diego Abstract The rich nature of modern Web services and the emerg- ing “mash-up” programming model, make it difficult to predict the potential interactions and usage scenarios that can emerge. Moreover, while the potential secu- rity implications for individual client browsers have been widely internalized (e.g., XSS, CSRF, etc.) there is less appreciation of the risks posed in the other direction— of user abuse on Web service providers. In particular, we argue that Web services and pieces of services can be easily combined to create entirely new capabilities that may themselves be at odds with the security policies that providers (or the Internet community at large) desire to enforce. As a proof-of-concept we demonstrate a fully- functioning Web proxy service called CloudProxy. Con- structed entirely out of pieces of unrelated Google and Facebook functionality, CloudProxy effectively launders a user’s connection through these provider’s resources. 1 Introduction The modern Web service ecosystem is one built on composition; using Web-based server APIs and client Javascript to create new services and capabilities. With a few lines of code one can connect a Google Maps widget to a Facebook app with a Microsoft datasource and, in so doing, “mash-up” an entirely new Web service neither envisioned, nor endorsed, by any of the individual ser- vice providers. Overall, this programming environment promotes reuse and agility, but inevitably at the expense of encapsulation or clean semantic guarantees. It is no surprise then that this dynamic style of programming can create new security risks; cross-site scripting, cross-site request forgery and so on. However, to date most of the attention on these problems has focused on violations of the client’s security policy—what can a Web site do to a browser? [10, 13, 14]. In this paper, we opine that there is another class of security concerns that involve the lack and difficulty of policy enforcement by the providers of Web services. As a trivial example, Google’s GMail was designed and intended as a free e-mail service, but systems such as gmailfs[4] bypass this intent by wrapping the APIs to create a free file system service. Similarly, the Graf- fiti network, abusively implements file storage using blog spam on open forums [11]. However, these simple examples belie the potential complexity that can arise from exploiting combinations of services, both within and across providers. Many modern Web services can have both local and remote side-effects, can fetch objects from other sites, can change local state, and can invoke additional services encapsulated as arguments. In this manner, a user may leverage the reputation of a collection of Web service providers to engage with a third-party on their behalf. In this paper, we explore this issue by example, demonstrating the creation of a functioning Web proxy from unrelated components. In a manner metaphor- ically similar to Shacham’s “return-oriented program- ming” [12] we demonstrate how pieces of correctly- functioning Web services from different providers can be stitched together to create completely new function- ality. The resulting free service, CloudProxy, launders a user’s connection through the servers of Google and Facebook. This capability could be used to bypass IP- level access restriction (e.g., such as commonly used to restrict streaming video or purchasing options within geo-locked regions), to commit mass Web spam without fear of blacklisting (i.e., no Web site depending on adver- tising can afford to blacklist Google IPs since Google- bot visits are what fills the Google search index), or to mount denial-of-service attacks on third parties us- ing these provider’s resources (e.g., by repeatedly down- loading large objects). In the remainder of this paper we detail the design process for building our Cloud- Proxy service and some of these risks it creates. How- ever, our service is less a threat in and of itself than a proof-of-concept that abusive Web service composition can be used to synthesize new threats from benign pieces. Moreover, since these threats can be inherently cross- domain, there are interesting new challenges for how to best address such problems. 2 Design Overview In designing our Web proxy, we focused our efforts on the most widely used HTTP methods: GET and POST.1 Our design approach is summarized as follows: • We make use of public Web service APIs that have, either explicitly or implicitly, core functionality that allows us to retrieve content given an URL, 1Note that while other HTTP methods can be implemented in a sim- ilar way, we omit the details due to space constraints. 1 • If necessary, each request is rewritten to meet the constraints of these Web service APIs; this pre- processing step is itself accomplished by exploiting (other) existing Web services, • The responses from these Web services are reassem- bled in order to make the final response transparent to the user’s Web browser; While this approach is inelegant at times (made so in particular by our need to transform arguments to meet API restrictions or requirements) we will show that it is sufficient to construct a fully functional Web proxy, ca- pable of handling the majority of requests that might be made by a Web browser (e.g., page views, forms, real- time video streaming, etc.) In the remainder of this section, we explain in general how Web service APIs are repurposed, how we find ap- propriate APIs and how we handle per-API constraints. 2.1 The Building Blocks of Our Implemen- tation Unlike normal Web servers, which implement HTTP commands directly, the building blocks for our imple- mentation are the APIs provided by third-party Web ser- vices such as Facebook or Google. A typical such API can be a well-documented function call. For instance, it might be provided in a public library that is written in any language such as Python or Java (e.g., the Google Document API[3]). In our design, we make use of APIs that themselves perform HTTP commands; such building blocks that widely exist in current Web services. For example, con- sider the scenario in which a user wishes to provide some input data to an online document processing service. If the data is available on the Internet (e.g., such as via a Web server), rather than requiring the user to download the data to local storage and then upload it again, docu- ment service APIs typically provide an interface for users to specify the Internet location of the content, which is then downloaded directly to the service provider. This “loads for” capability, suggests that such document ser- vice APIs may provide a likely base “gadget” for build- ing a Web retrieval interface. 2.2 Discovering the Useful APIs The APIs we target can be provided explicitly as the ma- jor functions of a popular Web service (e.g., such as a URL shortening service), or implicitly as an ingredient of such an original service. In the former case, the work of finding the APIs devolves to enumerating the set of services and their attendant side-effects (e.g., as with the document API example) to find an appropriate set of can- didates. On the other hand, if the needed API is not pro- vided explicitly, or is provided with undesirable restric- tions, additional transformation may be required to pro- vide the desired functionality. For example, the Google Spreadsheet service supports importing images and dis- playing them on the Web. However, if one wants to fetch an individual image using this service, it may require de- composing the service into its constituent parts to iden- tify the key unbundled piece of functionality. Given the limited structure imposed on Web APIs, we do not currently have a good mechanism for system- atically and automatically detecting all possible build- ing blocks. However, we observe that our own mod- est knowledge combined with some manual investigation has been sufficient for the goals of this paper. As major providers regularize their APIs into well-formed catalogs this process will only become easier and we further be- lieve that implicit APIs will be susceptible to automated discovery and enumeration through techniques such as program analysis of Web service Javascript. 2.3 Adjusting the Input and Output Finally, standard HTTP requests can include URLs and other parameters, which are represented as strings. How- ever, these strings sometimes may not meet the input re- quirements of the APIs we wish to use. In this case, we need to craft the request string so that it is recognizable by the target APIs. There is a similar problem on out- put. Depending on the API we use, the output may be transformed to match the formatting rules of the API’s specifications. For these situations, we must implement the reverse transformation and normalize the content rep- resentation between service interfaces. Surprisingly, we have found that even these transformations can be per- formed entirely using third-party Web service APIs (de- tailed in Section 3). 3 Implementation In this section, we detail the concrete steps involved in making our proxy implementation fully functional. To be specific, our goal is to reimplement the functionality of HTTP GET and POST as well as obtaining the final URL of a Web object. We next explain how we imple- ment each piece of functionality in turn, including iden- tifying its requirements, finding appropriate service APIs and, when necessary, how services are composed to work around per-API restrictions. 3.1 HTTP GET Retrieving Web content is typically accomplished with HTTP GET method. We first describe how regular ASCII HTML content is fetched and then discuss how non-ASCII content requires its own work-arounds. 2 Figure 1: Importing Webpage in Spreadsheet. When a Webpage is loaded into Google Spreadsheet, the raw content of the HTML page is displayed. 3.1.1 ASCII Based Content As discussed in the design section, most Web services aim to process all data inside the cloud and provide only the final results to end users. However, there is a fun- damental similarity between a service retrieving online Web objects for cloud processing and a browser retriev- ing online Web objects for display — both require a full implementation of HTTP GET. The first such API that came into our attention is the ImportData(.) function provided by Google Spread- sheet. It is designed for users to quickly populate their spreadsheets based on online CSV or TSV files. This function takes a single parameter—the URL of the CSV or TSV file. Interestingly, we notice that this function can be used to retrieve any Web object, not only spread- sheets. Figure 1 shows a screenshot of retrieving a Web- page in spreadsheet. Unfortunately, as Figure 1 shows, Web content fetched via this interface is split across multiple spreadsheet cells as opposed to staying in the same cell. In particu- lar, the newline character ‘\n ’ splits the content across rows, while the comma ‘,’ splits content across multiple columns in