Ph D Thesis

Thesis
- Design and Evaluation of Enhanced Network Caching Systems to Improve Content Delivery in the Internet
Reviewers
- Emilio LEONARDI, Professor at Politecnico di Torino
- Gwendal SIMON, Associate Professor at Telecom Bretagne
Examiners
- Steve UHLIG, Professor at Queen Mary University
- Giovanna Carofiglio, Distinguished Engineer at Cisco
- Tijani CHAHED, Professor at Telecom Sud Paris
Abstract

Network caching is a promising technique to cope with today Internet traffic explosion and to help sustain the demand for an increasing user quality of experience. Nonetheless, the techniques proposed so far in the literature do not exploit all the potential benefits. Indeed, previous work usually aims to optimize hit ratio or other network-centric metrics, e.g. path length, latency, etc., which are relevant for the research community. On the other hand, network operators are more focused on more practical metrics, like cost and quality of experience. The usual approach in the literature is to target network-centric metrics and then, in some case, to incidentally observe the consequent benefits induced on the practical metrics we mentioned above. Our approach is different, in that we devise caching techniques that directly target the latter objectives, and we show that this allows to gain better performance.%0a%0aMore specifically, we first propose novel caching techniques to reduce the Internet Service Provider (ISP) operational cost, by preferentially caching the objects whose cost of retrieval is the largest. We define these techniques \emph{Cost-Aware Caching} strategies. We formalize the problem by means of Integer Linear Program (ILP) and provide a greedy algorithm that gives the optimal solution. Numerical results show a trade-off between the classic hit ratio maximization and cost reduction and give the theoretical bound to the cost-benefit caching can bring. We apply what we learned from the ILP to devise a novel online distributed policy that can be implemented in real networks and we give a probabilistic model based on Che's approximation. By means of large scale simulation, we compare the achieved benefit to the theoretical bound found via the ILP and we show the robustness of our solution in different scenarios.%0a%0aWe then observe that cost minimization cannot be the sole objective of an ISP; indeed, user quality of experience is another decisive factor. We focus on video delivery, since it is the most sensitive to user experience and represents the most part of the Internet traffic. Despite the fact that video is highly cacheable thanks to its inherent redundancy, classic caching techniques ignore its relevant peculiarities, the most important of which is that each video is represented by different files, that we call ``representations'', encoded at different bit-rates and resolutions. We introduce the \emph{Representation Selection Problem}, which consists in selecting the right available representations when caching videos, a problem which has been neglected so far. We describe the problem in terms of Mixed Linear Integer Problem (MILP) and we highlight some structural properties of the optimal solution, which guide us in designing an online distributed policy, implementable in real networks, that we show, by means of large scale simulation, to effectively balance user perceived utility and bandwidth usage.%0a%0aFinally, we point out that the techniques presented in the literature assume the perfect knowledge of the objects that are crossing the network. Nonetheless, this assumption does not hold anymore, since most of the traffic today is encrypted between the user and the Content Provider (CP). As a consequence, all network caching techniques are practically inapplicable by the ISPs. To overcome this limit, we propose Content Oblivious Caching, which allows the ISPs to cache, even if they are not able to observe the objects being served. The ISP allocates the cache storage to various content providers so as to maximize the bandwidth savings provided by the cache: the main novelty lies in the fact that, to protect business-critical information, ISPs only need to measure the aggregated miss rates of the individual CPs and do not need to be aware of the objects that are requested, as in classic caching.%0aWe propose a cache allocation algorithm based on a perturbed stochastic subgradient method, and prove that the algorithm converges close to the allocation that maximizes the overall cache hit rate. We use extensive simulations to validate the algorithm and to assess its convergence rate under stationary and non-stationary content popularity. Our results (i) testify the feasibility of content-oblivious caches and (ii) show that the proposed algorithm can achieve within 10\%25 from the global optimum in our evaluation. %0a%0aOverall, our research shows that, despite network caching has been investigated for more than 20 years, it is still an interesting and open research problem, with room for novel ideas and techniques. Indeed the continuous evolution of Internet calls for an equally continuous evolution of network caching. This is necessary to keep caching effective in improving the most relevant metrics for network operators and users and more integrated with the evolving network architectures and the Internet economic ecosystem.