Evaluation of Akamai's Server Allocation Scheme
Bo Yang
In this experiment, Akamai's servers are used for the test. One attacking machine attacks an Akamai server by opening a number of TCP connections on the server. A second machine measures the delay seen by normal users from the server under attack. A third machine monitors the load balancing behavior of Akamai's DNS servers during the test. The details of the experiment setup is in the Appendix. The following figures show the result of the test.

Fig. 1 delay experienced by a normal
user before, during and after attack. The server under test is Akamai's 192.35.210.199.
The measurement of delay starts before the attack. The first attack
starts 55 seconds later and lasts for 209 seconds. 115 seconds after the first
attack stops, the second attack starts and lasts for 108 seconds. The
measurement continues for 66 seconds after attack 2 stops. The delay is mostly
~20 ms with server not under attack, this corresponds to three dense stretches
in the figure (before attack1, after attack1 and before attack 2, after
attack2). Under attack, the user cannot establish connection with
192.35.210.199, the connection times out after 5 seconds before retrieving
the object. This is represented as a delay of -1000 ms in the figure above. The
two -1000 ms stretches in the figure correspond to the two attacks during
the test. During the attack, only a few connections went through, with delay of
3, 5, and 1 seconds.

Fig. 2 The IP address returned in the answer from Akamai's DNS server for the query "a740.g.akamai.net" during test. The two lines at y-value 0 represents the time period when attack 1 and attack 2 starts and ends, respectively. The other lines at y-value [1-6] represent IP addresses returned by Akamai DNS. During the normal period without an attack, 192.35.210.199(referred to as 199 afterwards) is always returned in the answer. However, after the attack started, Akamai's DNS server does not return 199 any more. Obviously we have load balancing here. The load balancer monitors a group of servers and decides who will get work for the next slice of time window and the DNS server returns IP addresses of these servers. How soon Akamai's DNS server takes off 199 from the answer reflects the frequency of workload exchange between Akamai's DNS server/load balancer and 199. The interval we have observed so far ranges from 1second to 30 seconds. Probably the frequency of workload udpate from the web server to the load balancer is at least once every 30 second (it might also depends on the severity of the load, with a higher frequency when load is high. It's also possible that a high load triggers an update message to the load balancer immediately). From the test above, we see the load balancer has chosen among servers199, 206, 207, 208, 150, and 180. One interesting observation is that while servers199, 206, 207, 208 are quite "close" to Stanford campus network (8 hops away, RTT = ~3 ms), 150 and 180 are 13 hops away with an RTT of ~40 ms. This means the at a certain time point during attack 2, the load balancer could not find a "local" server that has reasonably available resources to take more connections, and had to resort to some "remote" servers, like 150 and 180, for rescue (this is sort of "global" load balancing, although it may not be that "global"). Also, from the result, looks like 199 is a single machine, not a group of servers sitting immediately behind a load-balancing-capable IP router.
We have constructed a brief hierarchy of Akamai's DNS servers. This hierarchy is built in the following way. First, obtain the top level root DNS server list ([A-M].root-servers.net). Then query against the root servers with "a740.g.akamai.net". From the reply from the root servers, level 2 DNS servers in charge of .net are obtained. By querying level 2 DNS servers, Akamai.net's top DNS servers are obtained. This process is repeated. Three levels down in Akamai's DNS, the IP address of " a740.g.akamai.net" is obtained. DNS servers at the same level are deployed at different geographic areas (also network access points). Querying servers at the same level but at different area will get different answers. Each server assumes the query must come from its own grographic area and answers the next-level servers that are "local" to it (if we don't consider load balancing for now). Thus, normal clients can be served by an Akamai server that is "close" to him.
Conclusion: The number of simultaneous connections the Akamai server 192.35.210.199 can support is likely to be in the neighborhood of 4000 - 8000 (we used 4000 connections in our test, assuming that server has already taken some connections before the attack, so we reached 4000 - 8000 for a safer estimate. The server's actual capability could be well below 8000.)
Appendix:
Experiment Setup To obtain IP addresses of Akamai's servers, we used an "Akamaized" URL for an image object, http://a740.g.akamai.net/f/740/606/1d/image.pathfinder.com/time/daily/2001/0105/golf0529.jpg We then find a list of IP addresses for the name "a740.g.akamai.net". and pick one of the two IP addresses returned by the Akamai's DNS server and start "attacking" that server. The attacker program runs on a PC with Pentinum 933 MHz processor and 256 MB of RAM. The attack establishes 4000 connections with the server. Each of the 4000 connections requests the object mentioned above (19 KB). The 4000 connections can be established within 10-15 seconds. After sending the request, each connection will stay open on the client side. If a legitimate request has been sent, as in our case, the connection will not be timed out by the server for 8 minutes (tested by using telnet to send out a single request). Even if the server side may reduce this timeout value when the number of connections is large, the experiments below show that our workload is enough to affect the server's performance.
The measurement programs run on two Sun workstations with Ultra60 processor and >256 MB RAM. The first one tests the end-to-end delay experienced by normal users. This test program requests the object mentioned above directly from the IP address of the server under test. The delay of each object transfer is recorded. If the object is not obtained within 5 seconds, a timeout occurs, the old connection is closed, a new connection is established and a new request is sent out. The other one of the measurement machines digs and records the IP addresses returned by Akamai's DNS servers for the name "a740.g.akamai.net" every 5 seconds. This DNS query is sent directly to Akamai's DNS server to bypass Stanford's name.
Future work To get a better estimate, we could modify our experiment a little to do it this way: repeatedly attack 192.35.210.199(referred to as 199 afterwards) for multiple times so we make sure the number of connections other than those from our attacking machine is very small(here "repeatedly" means take down 199 first until DNS does not tell IP of 199 to the outside world; stop attack; once DNS begins to include 199 in its answer list, start attack again; repeat the process. This way old connections from other machine are purged from 199), then try attacking 199 with N connections. N is our guess for 199's capability. If 199 does not go down, increase N and redo the experiment from start. If 199 does go down, reduce N a little then redo the experiment from start. Repeat this until we find the right N value.