HTTP Caching Proxy

Introducing My HTTP Caching Proxy Project: A High-Performance, Scalable Solution As part of my efforts to deepen my understanding of networking, HTTP protocols, and high-performance server development

Introduction

As part of my efforts to deepen my understanding of networking, HTTP protocols, and high-performance server development, I created an HTTP Caching Proxy server as a course project. This proxy server is designed to handle HTTP requests and responses efficiently while caching the results to improve response times. The proxy is fully built in C++, leveraging advanced techniques like multi-threading, I/O multiplexing with epoll, and caching mechanisms based on LRU (Least Recently Used).

What is an HTTP Caching Proxy?

In simple terms, an HTTP proxy server acts as an intermediary between a client (such as a web browser) and an origin server (like a website). It forwards requests from the client to the server and sends back the server's response. By caching frequently requested content, a proxy server can significantly reduce the time it takes to fetch resources from the origin server and minimize bandwidth usage.

In my project, the caching proxy not only forwards requests and responses but also intelligently caches GET requests with a 200-OK status code and checks for cache validity based on expiration times and re-validation strategies. This reduces the number of requests that need to be sent to the origin server and ensures faster browsing for users.

Key Features and Architecture

1. High-Performance Design

I implemented the proxy server with multi-threading to handle multiple concurrent connections efficiently. By employing the epoll API for I/O multiplexing, the server can handle thousands of requests simultaneously without blocking. The server is designed using the master-slave Reactor pattern to separate the management of incoming connections from the actual request handling. This design ensures that the server scales well and remains responsive under high load.

2. Efficient Caching with LRU

A core feature of the proxy is its caching mechanism. The server uses an LRU (Least Recently Used) cache, meaning that the most frequently accessed resources are kept in memory, while less frequently used ones are evicted. This mechanism is crucial in ensuring that the cache stays within memory limits while maximizing performance. Additionally, cached responses are validated based on Cache-Control headers and ETags, which is consistent with the HTTP specification.

3. Log Management

A robust logging system is implemented to track every interaction with the proxy server. Each request is assigned a unique ID, and detailed logs are generated for requests, cache status, origin server interactions, and responses. This ensures that both users and developers can monitor and debug the server's behavior. The logs are stored in /var/log/erss/proxy.log inside the container and are accessible on the host machine via a mounted volume for easy analysis.

4. Error Handling

The proxy is designed to handle various errors gracefully, including:

  • Malformed requests (400 Bad Request)

  • Server errors (502 Bad Gateway)

  • Expired or corrupted cached data

If the origin server responds with an error or corrupted data, the proxy responds with the appropriate HTTP error code. This fault tolerance ensures a reliable experience even in the face of unexpected issues.

5. HTTPS and Tunneling Support

The proxy also supports HTTPS requests, which require the CONNECT method. For secure connections, the proxy opens a tunnel to the origin server and forwards encrypted data back and forth. The server logs when a tunnel is opened and closed, providing transparency into these interactions.

6. Dockerized for Easy Deployment

To facilitate easy deployment and testing, I containerized the proxy using Docker. A docker-compose.yml file and Dockerfile were created to allow the proxy to run as a Docker container. The container is configured to map port 12345 on the host machine to the proxy, and logs are saved in a directory on the host system, ensuring that developers can easily track the proxy's performance and behavior.

Technical Details and Implementation

  • Programming Language: C++

  • Networking: Linux Sockets, TCP

  • Concurrency: Multi-threading, Epoll, Thread Pool, Reactor Pattern

  • Caching: LRU Algorithm, Cache Expiration, Cache Re-validation

  • HTTP Protocols Supported: GET, POST, CONNECT (with optional support for other methods)

  • Error Handling: Graceful handling of server errors and malformed requests

  • Deployment: Docker, Docker Compose

Performance Improvements and Benchmarks

One of the key highlights of this project was improving the response time of the proxy. By using an LRU caching mechanism and ensuring that expired or stale cache entries were properly handled, the proxy was able to reduce response time by 30% compared to a non-caching setup.

The proxy was designed to handle over 10,000 concurrent connections, and thanks to the epoll-based I/O multiplexing and thread pool, it could scale to meet high demand. This makes it well-suited for production environments where performance is critical.

Logs and Debugging

The logging system I implemented provides detailed insights into the behavior of the proxy server. Every request and response is logged with a unique ID and includes information about:

  • The incoming request (method, URL, headers)

  • The cache status (whether the response was served from cache, if it was expired, etc.)

  • The server interactions (requests made to the origin server, responses received)

  • Responses sent back to the client

This logging mechanism is invaluable for debugging issues and understanding the server's internal state at any given time.

Conclusion

This HTTP Caching Proxy project allowed me to gain hands-on experience in network programming, server design, and high-performance systems. By implementing an efficient multi-threaded proxy with caching, I was able to deliver significant performance improvements for web browsing, while ensuring robust error handling and system monitoring.

Last updated