Performance of TCP Switch

A Study by Simulation using ns

 

 

 

 

 

 

Bo Yang

March 22, 2001


Table of Contents

1      Abstract 3

2      Introduction. 3

3      Architecture of TCP Switch. 4

4      Implementation of TCP Switch in ns. 4

5      Results and Discussions. 4

5.1       Scenarios. 4

5.1.1        Scenario 1. 4

5.1.2        Scenario 2. 5

5.1.3        Scenario 3. 5

5.2       Simulation model 6

5.2.1        1-Tx-per-Client (this may be a misnomer) 6

5.2.2        multi-Tx-per-Client (this may be a misnomer) 6

5.2.3        Analysis of TCP Switch Performance Bound. 7

5.3       Impact of  web page size. 13

5.4       Impact of quota values. 17

5.5       Impact of number of clients. 17

5.6       Delay vs. Cell Size. 20

5.7       Delay vs. timeout value. 21

6      Conclusion. 23

7      Future Work. 24

8      Appendix - Architecture of TCP Switch. 24

8.1       Overview.. 24

8.2       Ingress Gateway Switch. 25

8.2.1        Flow Classifier 26

8.2.2        Call admission/Resource reservation module. 26

8.3       Egress Gateway Switch. 26

8.3.1        Admission Control 27

8.3.2        Packet Reassembly. 27

8.3.3        Classifier 28

8.3.4        Scheduler 28

8.4       Core Switch. 28

8.5       Resource reservation signaling within a TDM cloud. 28

8.5.1        SETUP. 29

8.5.2        TEARDOWN.. 30

8.5.3        REJECT. 31

8.5.4        ACK.. 32

8.5.5        State Machines. 33

8.5.6        Control Signaling Illustration. 33

9      Appendix - Implementation of the TCP Switch Module in ns. 34

10        Reference. 36

 

1         Abstract

The performance of TCP Switch networks is studied and compared with that of packet switch networks by simulation using ns. Scenarios under which TCP Switch networks may perform better or worse than packet switch networks are identified.

 

Various factors that affect the performance of a TCP Switch network are also studied. Among these factors, the ones of the primary importance are found to be traffic patterns, the number of clients, and file size. They determine whether TCP switch performs better than packet switched network. Performance bounds of a TCP switched network, i.e. its best case and worst case, are studied.  In the best case, TCP Switch outperforms normal packet switched network significantly when the number of clients is large and the file size is big (i.e. a longer transfer time). The boundary conditions for the number of clients and the file size depend on the bottleneck link's capacity. When the number of clients is small and the file size is small, packet switch networks do better. In the worst case, TCP switch's performance degrade to that of the packet switch networks, with some extra overhead paid due to the control signal processing for resource reservation. The performance of TCP switch with a real traffic load is expected to fall somewhere in between the best case and the worst case.

 

Quota value and timeout value affects the performance. However, they are factors of the secondary importance. Cell size does not affect delay, only the amount of control messages (since we only have relatively small cell sizes here, the problem of fragmentation within a cell does not exist).

2         Introduction

 

Two trends in the current Internet motivate the design of TCP Switch.

 

The first trend is the increase of the link capacity (bandwidth) at both the core and the edge of the network. At the early stages of the Internet, a design decision was made to use packet switching instead of circuit switching because of bandwidth efficiency and robustness. Bandwidth efficiency was important at that time because network nodes clearly outpaced the links that interconnected them (modems at 2.4 or 9.6 kbps). Nowadays, however, we see that there is a lot of underutilized capacity in the optical fibers that interconnect today's routers, and more end-uses start using broadband access (DSL and cable modem). As a result, bandwidth efficiency is no longer as important as before. Instead, the processing at the routers is likely to become the performance bottleneck. The key point is to simplify processing at these routers.

 

The second trend is that more and more applications on the Internet are connection based. When packet switching was proposed, the major applications for network are email and telnet which have short connection time. Nowadays, the network traffic, such as mp3 and software download, tends to have a longer connection time. It becomes a question whether packet switching still offers the best solution.

 

We propose here an architecture called TCP Switch, which is a switch/router that intercepts TCP connection establishments and creates a pseudo-switched circuit to another TCP switch close to the destination end host. During the creation of the pseudo-switched circuit, resources (peak bandwidth) are allocated at the TCP switches along the path in the core of the network. The circuit creation, resource allocation and scheduling mechanisms are very simple. They are designed for optical switches, which are very fast, but have little buffering capabilities and logic.

 

The performance metric used to compare different schemes here is the response time experienced by a user, from the moment the user requests the information to the moment the last bit of that information arrives.

 

3         Architecture of TCP Switch

Please see the Appendix for details.

 

4         Implementation of TCP Switch in ns

Please see the Appendix for details.

5         Results and Discussions

5.1      Scenarios

5.1.1      Scenario 1

 

150 HTTP clients are connected to the ingress gateway switch, each through a 10 Mbps link with a propagation delay of 10 ms. The server is connected to egress switch with a 20 Mbps link with a delay of 10 ms. Ingress and egress are each connected to the core switch with a 5 Mbps link with a propagation delay of 40 ms. Each client generates page requests with an exponential average interval of 10 seconds. 
 

 

 

 

 

5.1.2      Scenario 2

 

The setup of this scenario is shown in the following figure. It is similar to scenario 1. However, background web traffic between S1-D1 and S2-D2 is present. Core0 and Core1 are connected by a 5Mbps link with a 20ms propagation delay. Web servers are connected to the gateway switches by 50Mbps links. All the other links have a bandwidth of 10 Mbps and a propagation delay of 10 ms. On the left side of the figure, 150 clients are connected to the TDM cloud and each generating a page request with an exponential average interval of 10 seconds. For the background traffic,  10 clients are present in each of the two background traffic pairs. They too generate page requests at an exponential average of 10 seconds. All the clients are started at the beginning of the simulation. 
 

 

 

 

5.1.3      Scenario 3

 

The setup of this scenario is shown in the following figure. The center link is of 5 Mbps with a delay of 20 ms. The inner circle links are of 2 Mbps. The outer circle links are of 1 Mbps. Clients and servers are linked to the TDM cloud with links of 1 Mbps and 20 Mbps, respectively. All links have a propagation delay of 10 ms except for the center link. Web traffic is established in the diagonal direction of the cloud. There are 4 such traffic pairs. In each of them, a server serves page request from a client cluster which contains 40 clients. 

 

 

 

5.2      Simulation model

 

HTTP server and client agents provided by ns are used in the simulation. Two simulation models, 1-Tx-per-Client and multi-Tx-per-Client, are used in the simulation. They provide the performance bounds of the TCP Switch network. The meaning of each model is described below.

5.2.1      1-Tx-per-Client (this may be a misnomer)

In this mode, each client is served exactly once.  At the beginning of the simulation, each client requests a web page. If the SYN packet goes through, i.e., the resource reservation succeeds, the file transfer will begin. After the complete web page is received, this client will not request the web page any more. If the SYN message did not go through, i.e., the resource reservation fails, this client will resend the SYN message until the SYN finally goes through (The HTTP agent module is modified to add this retry behavior to simulate the user behavior more accurately, although it should not affect the result because SYN is queued at the gateway TCP Switch). The SYN message is queued at the gateway TCP switch. Duplicate SYN messages are discarded by the gateway TCP switch. The response time experienced by each client is then calculated by recording the time when the very first request is sent and the time when the complete web page is received.

5.2.2      multi-Tx-per-Client (this may be a misnomer)

In this mode, the web page request generated by each client is modeled as a Poisson process with a configurable exponential average interval. After the connection is established and the page transfer is in progress, the client will not generate another request until after this transfer is finished. The HTTP agent module is modified to add this retry behavior to simulate the user behavior more accurately, although it should not affect the result because SYN is queued at the gateway TCP switch). As mentioned above, the requests by a client are exponentially distributed. In all the following simulations, the average interval is 10 seconds. The SYN message is queued at the gateway TCP switch. Duplicate SYN messages are discarded by the gateway TCP switch. A client may finish multiple transfers of the web page during the simulation. The response time for each transfer is calculated by recording the corresponding start and end time. The average response time for each client is then calculated based these multiple transfers.

5.2.3      Analysis of TCP Switch Performance Bound

The two modes above, namely 1-Tx-per-Client and multi-Tx-per-Client, represent the best case and the worst case traffic pattern/user behavior for TCP Switch network. In 1-Tx-per-Client, the best case, TCP Switch outperforms normal packet switch network significantly when the number of clients is large and the file size is big. In the worst case, the multi-Tx-per-Client mode, the performances of the TCP Switch network and packet switch network in terms of delay (user experienced response time) will be quite similar, with the TCP Switch network paying an overhead because of the control signaling processing. This is illustrated in the following example.

 

Assume there are 30 clients; each client finishes N transfers, with an average request interval of 10 seconds. Further assume that the TCP switched network can accommodate 10 clients at the same time, so these 30 clients are served in 3 batches in TCP switched network. Each batch finishes the transfer in 100 seconds after it is accepted. The 1-Tx-per-Client, 2-Tx-per-Client, and normal packet switch cases are as shown in the following figures.

 

The average response time for normal packet switched network is 300 seconds.

In the 1Tx situation, the average response time of all the clients is

(10 x 100 + 10 x 200 + 10 x 300) / 30 = 200 seconds;

In the 2Tx situation, the average response time of all the clients is

(10 x 100 + 10 x (300-10) + 10 x 200 + 10 x (200-10) + 10 x 300 + 10 x (300 - 10)) / 60 = 228 seconds.

 

 

 

 

 

 

 

 

 

 

It is easy to see that as N goes up, the average response time in the TCP switched network will be quite similar to that of the normal packet switched network. The reason is that after the first transfer, all the remaining transfers will be blocked waiting for the other two batches to finish first before they can be served, thus brings the average delay to the same level as that of the packet switched network. Of course, here we are assuming: 1. The same workload is applied to the network all the time so that a later request will always be blocked waiting for a certain amount of time; 2. the same set of clients are generating these requests repeatedly and the request interval is quite small compared to the real file transfer time needed, so that the blocking time will negatively affect the overall delay significantly. (In the simulations using multi-Tx-per-Client below, the average interval we have here is 10 seconds, which is about the same as the shortest file transfer time in the simulations).

 

In reality, the traffic load varies from time to time, and the interval between different requests from a client varies. Thus, we would expect that under the real scenario, the performance of TCP switched network should be somewhere between the 1-Tx-per-Client and the multiple-Tx-per-Client patterns mentioned above. In other words, the performance gains in term of response time experienced by users in a TCP switched network should be bound by that in the1-Tx-per-Client and the multiple-Tx-per-Client patterns.

 

The result of a multi-Tx-per-client run is shown in the following figure. The simulation is run for 3600 seconds to each client finish multiple transfers. When the file size is below 1MB, the difference between TCP switch network and packet switch network is small, with TCP Switch's delay a little bit higher because of the extra overhead for control signaling in TCP Switch network. When the file size is big, such as 1.5 MB or 2 MB, TCP Switch renders a much lower delay than the packet switch network. However, the difference is only about half as the cases presented later for those obtained under 1-Tx-per-client mode. The reason is as mentioned above, as N, the number of transfers completed by each client, goes up, the overall average delay for each client will increase as well (assuming the request interval is quite small compared to the file transfer time so that we have the worst case). The fact that we still see a difference in delay here when the file size is big is because that N is still not big enough when file size is greater than 1.5 MB and simulation time is 3600 seconds.

 

 

 

Scenario 1. The simulation is run for 3600 seconds under scenario 1. 150 clients. Each client generates requests with an exponential average interval of 10 seconds and completes multiple transfers during the simulation. Web page size varies from 50 kB to 1.5 MB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte. Quota = 1 x 64 kbps.

 

 

 

Multi-Tx-per-Client mode represent the worst case traffic pattern for TCP Switch, it does not provide too much room in performance gain for us to explore the impact of different factors on TCP Switch. As a result, while studying the impact of various factors on TCP switch's performance, 1-Tx-per-Client mode is used.

 

The following sections discusses impact of file size, quota value, timeout value, cell size and traffic load (number of clients). All the simulation results presented in later sections are obtained using the 1-Tx-per-Client mode.

 

 

 

5.3      Impact of  web page size

 

Simulations with various page sizes are run under scenario 1. The results are shown in the following figures. In general, the trend we can see is that as the file size increases, TCP switch outperforms normal packet switched network by as much as 36%. When the file size is small, for example, in the range of 50kB to 200 kB, the normal packet switched network performs better. This is as expected as the overhead paid in TCP switched network offsets its gain while the file size is small.

 

 

Scenario 1. 150 clients. Each client finishes one transfer. Web page size varies from 50 kB to 1.5 MB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte. Quota value is in unit of 64 kbps.

 

 

 

This figure also shows the effect of different quota values on the average delay. Generally speaking, a larger quota will result in smaller delay. The reason is because a larger quota means a smaller size for each batch of clients processed at the same time, a larger number of batches, and a shorter delay for each batch of the clients.

 

The request and receive of the web pages are recorded in a log file. In the following figure, the "batch" processing when quota is 1 and 2 is clearly seen. When quota is 1, roughly 2 batches is needed, and 4 batches is needed when quota is 2.

 

 

Scenario 1. quota = 1 x 64 kbps, timeout = 1 second, slot size = 1. 150 clients. File size = 1 MB.

The bottleneck link bandwidth can accommodate ~70 clients at the same time. The majority of the 150 clients are processed in roughly two batches, with some leftover clients processed in the third batch. From the figure, it can be seen that the first batch finishes at around 140 seconds, the second batch at around 280 seconds, some leftover in the third batch finishes at around 420 seconds. Some clients' transfer time is much longer than 140 seconds, this is because their circuit in the TCP Switch network core is timed out in the middle of the connection.

 

 

Scenario 1. quota = 2 x 64 kbps, timeout = 1 second, slot size = 1. 150 clients. File size = 1 MB.

The bottleneck link bandwidth can accommodate ~35 clients at the same time. The majority of the 150 clients are processed in roughly four batches, with some leftover clients processed in the fifth batch. From the figure, it can be seen that the first, second, third, and forth batch finish at around 70, 140, 210, and 280 seconds, respectively. The leftover clients are processed in the fifth batch and finish at around 350 seconds. Some clients' transfer time is much longer than 70 seconds, which is the normal transfer time for a 1 MB file when 2 x 64kbps is reserved. This is because their circuit in the TCP Switch network core is timed out in the middle of the connection.

 

 

The results of delays with various web page sizes for scenario 2 and scenario 3 are shown in the following figures.

 

 

 

Scenario 2. 150 clients. 10 x 2 clients in background traffic. Each client finishes one transfer. Web page size varies from 50 kB to 1.5 MB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte. quota value is in unit of 64 kbps.

 

 

 

Scenario 3. 40 x 4 clients. Each client finishes one transfer. Web page size varies from 50 kB to 1.5 MB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte. quota value is in unit of 64 kbps.

 

5.4      Impact of quota values

 

See the results shown above in the Impact of Web Page Size section.

5.5      Impact of number of clients

 

When the number of clients is large and the file size is big ( > 500 kB), TCP Switch outperforms Packet switched network by as much as 28% in terms of delay. For both packet switched network and TCP switched network, the delay increases when the file size increases. However, TCP switched network has a much smaller slope of increase when the file size is larger than 500 kB. Similarly, for both packet switched network and TCP switched network, the delay increases when the number of the clients increases. However, TCP switched network increases with a slower pace when the file size exceeds 500 kB. (When the file size is smaller than 500 kB, TCP switched network does not show a clear advantage over packet switched network.)

 

Another observation is that in TCP switched network, when the number of clients is smaller than or equal to the one that can be processed in one batch, the delay will remain the same since every client gets the same bandwidth in each case and no one can claim the unused bandwidth. This is seen in the figure: 30-Ts, 60-Ts, and 70-Ts have the same results and their lines overlap with each other. In contrast, 30-Ps, 60-Ps, and 70-Ps have different results, with 30-Ps having the smallest delays.

 

 

Scenario 1. 150 clients. Each client finishes one transfer. Web page size varies from 50 kB to 1.5 MB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte, quota = 1 x 64 kbps. TCP Switch's result lines are marked by labels in the figure.

 

 

 

This figure is an enlarged view for file sizes ranging from 50 kB to 500 kB.

Scenario 1. 150 clients. Each client finishes one transfer. Web page size varies from 50 kB to 500 kB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte, quota = 1 x 64 kbps. TCP Switch's result lines are marked by labels in the figure.

 

 

 

 

This figure is an enlarged view for file sizes ranging from 500 kB to 2 MB.

Scenario 1. 150 clients. Each client finishes one transfer. Web page size varies from 500 kB to 2 MB. Packet Switch (PS), TCP Switch (TS) are compared. In TCP Switch, timeout value =  1 second, slot size = 1 Byte, quota = 1 x 64 kbps. TCP Switch's result lines are marked by labels in the figure.

 

5.6      Delay vs. Cell Size

 

In the TDM link used in the simulation, a quota value of 1 x 64 kbps corresponds to a cell size of 1 byte (a 1 x 64 kbps quota should not have a cell size bigger than 1 byte). A quota value of 2 x 64 kbps can have a cell size of either 1 or 2 bytes.

 

In following simulation, cell size of 1 byte and 2 bytes are compared under scenario 1 with a quota value of 2 x 64 kbps. As expected, cell size does not affect the delay (the lines of slot=1B and slot=2B overlap in the figure), although it will affect the size of the control messages.

 

 

 

Scenario 1. 150 clients. Each client finishes one transfer. Web page size varies from 50 kB to 1.5 MB. Packet Switch (PS), TCP Switch (TS) with cell size = 1 Byte and cell size = 2 Byte are compared. In TCP Switch, timeout value =  1 second, quota = 2 x 64 kbps.

 

5.7      Delay vs. timeout value

 

A timeout value that is too short will generate false timeouts during the middle of a connection. The false timeouts offset the advantages of resource reservation and degrade the performance of TCP Switch. Another way of thinking about this is that a TCP switch network with an infinite small timeout value actually is reduced to a packet switched network, if the control signaling overhead is not considered.

 

On the other hand, a timeout value that is too big increases the interval needed between different batches of clients. When the number of batches increases (i.e. the quota value increases or the total number of clients increases), this extra interval can degrade the performance as well.

 

The following two figures show the comparison of delays with different timeout values under scenario 1 and 2.

 

In the first figure, when the quota is 1 x 64 kbps, increasing timeout from 1 second to 3 seconds improves the delay slightly. This is because of the reduction of the number of false timeouts (see the GET-RCV event sequence figure above in the Web Page Size section for the false timeout illustration).  When the quota is 2 x 64 kbps, increasing the timeout from 1 second to 3 seconds makes the delay bigger. This is because the disadvantage coming from the extra delay between different batches offsets the advantage of a larger timeout.

 

In the second figure, when the quota is 1 x 64 kbps, increasing timeout from 1 second to 3 seconds does not have too much effect. This is probably because the number of false timeouts in this case is not that many to make a difference. When the quota is 2 x 64 kbps, increasing the timeout from 1 second to 3 seconds makes the delay bigger for the same reason as mentioned above.

 

 

Scenario 1. 150 clients. Each client finishes one transfer. Web page size varies from 50 kB to 1.5 MB. Packet Switch(PS) and TCP Switch (TS) with different quota and timeout values are compared. cell size = 1 Byte. Quota value is in unit of 64 kbps.

 

 

Scenario 2. 150 clients. 10 x 2 clients in background traffic. Each client finishes one transfer. Web page size varies from 50 kB to 1 MB. Packet Switch(PS) and TCP Switch (TS) with different quota and timeout values are compared. cell size = 1 Byte. Quota value is in unit of 64 kbps.

 

6         Conclusion

 

Three conclusions can be drawn from the simulation results.

 

First, traffic pattern and user behavior affect TCP Switch's performance. The performance bound is given by that in the best case pattern (1-Tx-per-Client) and the worst case pattern (N-repeated-Tx) for TCP Switch. The real traffic pattern and user behavior will result in a performance that falls in between the best case and the worst case.

 

Second, TCP Switch outperforms normal packet switched network when the number of clients is large and the file size is big (i.e. a longer transfer time). The boundary condition for the number of clients depends on the bottleneck link's capacity. For example, in scenario 1 and 2, it is 70. The boundary condition for the file size is 500 kB, 500 kB, 200 kB for scenario 1, 2 and 3, respectively. When the number of clients is small and the file size is small, TCP Switch has a longer delay because it still incurs certain amount of overhead in control signaling and impose an upper limit on the maximum resource each client can use. When the number of clients is small and the file size is small, normal packet switch network provides a better delay, since it does not have the control signaling overhead in TCP switch network.

 

Third, quota value and timeout value affects the performance. However, they are sort of the factors of the secondary level. Cell size does not affect delay, only the amount of control messages. The factors of the primary importance that determines whether TCP switch performs better than packet switched network are traffic patterns, the number of clients, and file size.

7         Future Work

 

Simulations using real web traces can be run to verify the performance bound proposed above. Since the average connection time in the current Internet tend to be small, normal packet switch network may perform better. Some web traces containing longer connections should also be chosen to test the scenario where TCP switch may perform better than the normal packet switch. Certain real scenarios under which TCP switch provides an advantage should be identified.

 

 

APPENDIX

8         Appendix - Architecture of TCP Switch

 

This section is taken from [5]. Please refer to [5] for details.

8.1       Overview

The network topology shown fig. 1 is used in the discussion of the architecture. R1 and R2 are IP routers. C1 is a cloud of TDM switches. S1 and S2 are both gateway switches. S2 is a core switch within the cloud. For a flow that goes in the direction R1 -> S1 -> S2 -> S3 -> R2, S1 is the ingress gateway switch. S3 is the egress gateway switch. 
 
 
 
TCP switch is implemented on gateway switches and core switches in the TDM cloud. The gateway switch has three major components: flow classification, admission control/resource allocation, and control signalling processing. The core switch does not need the flow classifier. But the other two components, namely admission control/resource allocation and control signalling processing are needed in the core switch. 
 
A signalling mechanism is designed to let the switches in the TDM cloud communicate with each other for resource reservation-related information. This signaling uses four types of messages that are sent using a control channel on the TDM link. The four types of the control messages are SETUP, TEARDOWN, RELEASE, and ACK. Circuit setup is initiated by the ingress gateway switch when it detects the first packet of a "switchable" new flow. Circuit teardown is initiated by the ingress router as well. It could be triggered by detection of the "end of a flow", either by receiving a FIN/FIN ACK packet or a timeout. Circuit setup/teardown at the core switch is triggered only by explicit messages. 
 
The following sections describe gateway switches, core switches, their components and the signalling mechanisms in details. 
 

8.2       Ingress Gateway Switch

The architectures of an ingress gateway switch and an egress switch are quite similar, although minor differences exist. Since a gateway switch could be both an ingress and an egress, it should support both ingress' and egress' capabilities.
 
The major components of an ingress gateway switch is shown in fig. 2. 
 
 

8.2.1       Flow Classifier

   This component determines whether any resource should be allocated for this flow. Since resources will only be reserved for TCP flows, any non-TCP flows will be simply processed as normal. For a TCP flow, the flow classifier will first check if any "circuit" has already been established for this flow. If so, the packet will be switched using that "circuit". Otherwise, the flow classifier will invoke the call admission module to do the call admission/resource reservation process. The call admission module will decide whether this flow can be accepted or should be rejected. 
    In order to maintain the established "circuit" information for each accepted flow and facilitate rapid lookup, a hash table can be used. The hash key is the 4-tuple . Each bucket in the hash table contains the corresponding output port number and slot number. 
 

8.2.2       Call admission/Resource reservation module

    This module decides whether a flow can be accepted or not. To do this, this module needs to first have an estimate of how much resource should be allocated for that particular flow. This is done on an application specific way. Which type of application this flow belongs to is determined by checking the port number in the packet against those well-known ports. For each type of application, the required bandwidth is given by a fixed pre-configured value. For a TCP switch in the middle of the TDM cloud (TDM-in, TDM-out), the required bandwidth is specified by the upstream switch. 
    Once an estimate of the desired bandwidth of the flow is determined, the admission decision can be made simply by comparing the estimate with the resources currently available. If this flow is accepted, a "circuit" will be created for this flow by assigning certain output port and output slots to it, an entry will be created for it in the hash table used by the flow classifier to record this. 
  scheduler
On each output port, a queue is maintained for each output slot. To avoid packet misordering, a certain flow's packets are buffered in this queue before the circuit within the TDM cloud for this flow is set up successfully. 
 
 

8.3       Egress Gateway Switch

 
The major components of an egress gateway switch is shown in the following figure.
 
 
 
Control Signaling Processing
This is discussed in the signaling section.
 

8.3.1      Admission Control

This is similar to the Admission Control of an ingress gateway switch. But here in addition to the hash table, it needs to maintain the slot number-flow id mapping table for packet reassembly as well.  
 

8.3.2      Packet Reassembly

It is possible that a flow is assigned with multiple noncontiguous slots. One single packet can span across two or more slots. In order to assemble these packets from multiple slots, buffer spaces for each flow are used. A table indexed by slot numbers is maintained to map each slot to its corresponding flow's buffer. In each entry of the table, a pointer to the buffer space of the flow to which this slot belongs to is stored. The buffer also keeps track of current start and end offsets. 
 

8.3.3      Classifier

The classifier takes the reassembled packet, extracts the packet's flow id. Using this flow id as the hash key, packet scheduling parameters are looked up though the hash table. The packet is then scheduled using these parameters.
 

8.3.4      Scheduler

Class Based Queuing with Weighted Round Robin (CBQ/WRR) can be used to schedule the packets.
 

8.4       Core Switch

The major components of the core switch is shown in the following figure.
 
 
In the core switch, control channel is always "switched" to the admission control/control signaling processing first. Depending on the type of the control message, the admission control/signaling processing takes actions accordingly (see the signaling part). 
 
The data channel is always switched without going through the admission control module. This is because only flows with its circuit already set up within the TDM cloud will be allowed to send packets (controlled by the ingress gateway switch).
 

8.5       Resource reservation signaling within a TDM cloud

The signaling for circuit setup/teardown in the TDM cloud is sent using a control channel. Four types of signaling messages are used. They are: SETUP, TEARDOWN, REJECT, and ACK.  Wherever possible, the next-hop lookup through Destination IP address is avoided. Rather, a direct mapping between incoming port/time slot and output port/time slot is performed to  speed up the processing at the core switch. Thus, only the SETUP message contains the destination IP address. 
 

8.5.1      SETUP

 
The message format of SETUP is as follows:
 
 
TYPE: 00. SETUP.
 
Dest:   This is the 32-bit destination IP address. 
 
nSlots:   Number of slots requested. The ingress gateway switch determines this number using an application-specific estimate. This value of nSlots should be at least 1. The maximum value of nSlots is 255. This should be able to accommodate the number of slots needed for very "big" flow even when each slot's bandwidth is very small (For a flow that requires 5 Mbps when each time slot corresponds to a bandwidth of 56 kbps, the number of slots needed by this flow is 5 Mbps/56 kbps ~ 100). If we assume an average bandwidth for each flow as 500 kbps and a slot of 56 kbps, then nSlots should be ~10 in average.
 
Slot 1, Slot 2, ... Slot n: These n fields specify the time slot number assigned to this flow with FlowID by the upstream switch. Each time slot number is a 16-bit value. This should be able to accommodate the current "fat" SONET links up to STS-48, or OC-48. Assume the minimum bandwidth corresponding to each time slot is 56 kbps, the number of slots used by STS-1, or OC-1, will be 51.84 Mbps/56 kbps ~ 1000. The number of slots used by STS-48, or OC-48, will be 2488.32 Mbps/56 kbps ~ 50,000. 
 
Processing of the SETUP message
The ingress gateway switch initiates the circuit setup process when first packet of a new "swithable" flow is detected (could be either a TCP SYN message or a data packet belonging to a new TCP flow not seen before at this ingress switch). The ingress switch allocates certain number of output time slots to this flow. Then it sends a SETUP message to its downstream switch. The SETUP message indicates how many incoming slots the downstream switch should be prepared to accept, and further, what these slots are. 
 
Upon receipt of such a SETUP message, the downstream switch tries to allocate resources(time slots) according to the SETUP request. What it needs to do is to allocate nSlots number of slots in the corresponding output port. If the allocation succeeds, this downstream switch prepares and sends a new SETUP message to its own downstream switch (if the allocation fails, this downstream switch should generate a REJECT message and send it upstream. This is described in REJECT message part). In this new SETUP message, the output time slots' numbers are indicated (they are incoming slot numbers from the next downstream switch's point of view), so the next downstream switch can again allocating resources accordingly. The next-hop lookup is done using Dest field, the destination IP address.
 
In this way, this SETUP message is propagated along the path within the TDM cloud. Resources are allocated at each switch along the path, in the forward direction. The last switch in this path will be an egress gateway switch. If it succeeds in allocating requested resources, the egress gateway switch sends an ACK back upstream(see the ACK part). 
 

8.5.2      TEARDOWN

 
 
TYPE: 01. TEARDOWN. 
 
nSlots:  Number of slots to be released. The ingress gateway switch stores the number of slots that is assigned to this flow and initiates the TEARDOWN process. 
 
Slot 1, Slot 2, ... Slot n: These n fields specify the time slot number to be released. Each time slot number is a 16-bit value. Similar to SETUP message, this should be able to accommodate the current "fat" SONET links up to STS-48, or OC-48. For a downstream switch that receives this message, these slot numbers are the incoming ones. This downstream switch should release the corresponding output slots numbers tied up with these incoming slots. This downstream switch then rewrite the Slot 1, Slot 2, ... Slot n fields with these corresponding n output slots numbers, and send the rewritten message to its own downstream switch.
 
Processing of the TEARDOWN message
The TEARDOWN process is initiated by the ingress gateway switch when the ingress switch detects the end of a flow, this could be either receipt of a TCP FIN/FIN ACK packet or a timeout for an idle flow. Using its hash table, the ingress switch retrieves the time slots allocated to this flow. Using this information, a TEARDOWN message is prepared and sent to the corresponding downstream switch. In the message, the ingress switch indicates for which time slots (they are incoming time slots from the downstream switch's point of view) the downstream switch should deallocate the resources. 
 
Upon receipt of such a message, the downstream switch deallocate those output time slots associated with these incoming time slots indicated in the TEARDOWN message. It then generates a new TEARDOWN message and sends it to the next downstream switch. In the new TEARDOWN message, the Slot 1, ... Slot n fields are rewritten with the corresponding output time slot numbers at this downstream switch. 
 
This TEARDOWN message is propagated along the path. Each switch on the path deallocates their corresponding resources. The last switch to do so is the egress gateway switch. 
 

8.5.3      REJECT

 
 
TYPE: 10. REJECT. 
 
nSlots:  Number of slots to be released.   
 
Slot 1, Slot 2, ... Slot n: These n fields specify the time slot number to be released. Each time slot number is a 16-bit value. Similar to SETUP message, this should be able to accommodate the current "fat" SONET links up to STS-48, or OC-48. 
 
Processing of the REJECT message
REJECT is generated during the SETUP process. If a core switch or the egress gateway switch cannot allocate enough resources as requested in the SETUP message, the reservation for this flow within the TDM cloud fails. This core switch (or egress switch) that decides to reject this SETUP should stop forwarding the SETUP message, generates a REJECT message and sends the REJECT message to the upstream switch. In this REJECT message, this switch indicates for which incoming time slots (they are output time slots from the upstream switch's view) it is rejecting. 
 
Upon receipt of such a REJECT message, the upstream switch releases the time slots indicated in the message (these slots are output slots for this upstream switch). It then generates a new REJECT message and forwards it to its own upstream switch. In this new REJECT message, the slot numbers are rewritten with those corresponding incoming time slots of this flow. 
 
This REJECT message is thus propagated upstream until it reaches the ingress gateway switch. Each switch along the path will release the corresponding time slots.
 

8.5.4      ACK

 
 
TYPE: 11. ACK. 
 
nSlots: Number of slots ACKed (successfully allocated).
 
Slot 1, Slot 2,... Slot n: These n fields specify the time slot numbers that are successfully allocated. Each time slot number is a 16-bit value. Similar to SETUP message, this should be able to accommodate the current "fat" SONET links up to STS-48, or OC-48. 
 
Processing of the ACK message
If the SETUP process is successful, the egress switch will generate an ACK message (after the egress itself processes the SETUP request) and send it upstream. The ACK message contains the time slots for which the sending switch has successfully allocated resources. 
 
Upon receipt of such an ACK, a core switch should remap those time slots to its own corresponding incoming time slots, rewrite the message with the remapped slots and send the new ACK upstream. 
 
Upon receipt of an ACK, the ingress switch should starting sending packets in the slots indicated in the ACK. The packets are dequeued from the corresponding queues.
 

8.5.5       State Machines

 
 
 
 
 
 
 
 

8.5.6       Control Signaling Illustration

The following figure illustrates how a circuit is set up in the TDM cloud. The SETUP process is triggered by the detection of  a TCP SYN message. The mapping betwee the incoming and outgoing slot numbers is done while a switch relays the control signals. In this figure, the SETUP succeeds. TEARDOWN and REJECT are handled in a similar fashion.
 
 

9          Appendix - Implementation of the TCP Switch Module in ns

The components in the TCP switch Networks are: Gateway Outlink Admission Controller, Gateway Inlink Admission Controller, Core Outlink Admission Controller, Core Inline Admission Controller, Flow Classifier, TDM timeslot Classifier Flow Manager, TDM Link (including TDMQueue and TDM Timeslot) and Reassembler
 
1. Gateway Outlink Admission Controller 
 
   Functions implemented in Gateway's Outlink Admission Controller are:
   (a) To estimate the bandwidth for a new flow and allocate time slot        resource and create a flow manager to control the new flow. Generate a SETUP control packet and send it to next node.
   (b) Queue request packet if no resource is available right now. When some resource are released, the admission controller will allocate resource for the pending request. The pending request will timeout for some amount of time
   (c) Use timeout to check if a flow is idle for a while. If it's idle, release the resource and send TEARDOWN control packet to the next node
   (d) Start flow manager when it gets ACK control packet forwarded by its Inlink Admission Controller. Release resource and delete a flow manager if it gets REJECT control packet forwarded by its Inlink Admission Controller
   (e) There are two flow managers which are static in Outlink Admission Controller. One for NonTCP packet and one for control packet. The two flow managers and the timeslots resource are allocated when the network starts and kept in the admission controller until network stops  
 
2. Gateway Inlink Admission Controller Functions implemented in Gateway Inlink Admission Controller    
 
   (a) Handle incoming SETUP control signal. When a SETUP is received, create a new reassembler and call the TDM timeslot classifier to install the timeslots in the flow to the reassembler
   (b) Handle incoming RELEASE control signal. To delete the reassembler and let timeslot classifier clear the entry for the timeslots.
   (c) For ACK and REJECT control signal. Forward the packet to the corresponding outlink admission controller
 
3. Core Outlink Admission Controller
   Functions implemented:
    (a) Handle SETUP control signal. To allocate some resources for the new flow, create a new flow manager and let the timeslot classifier install the slot on the flow manager
   (b) Handle RELEASE and REJECT control signal. To free the timeslot resource, delete the flow manager and let the timeslot classifier to clear the flow manager from the slot entry
   (c) Two flow manager for NonTCP and Control Signal packet
   (d) Maintain a mapping table for each outgoing timeslot to incoming timeslot and TDM timeslot classifier
 
4. Core Inlink Admission Controller
   (a) Forward a copy of REJECT packet to the corresponding Outlink Admission Controller. Then send modify the slotlist in REJECT packet and send REJECT to the next node      
  (b) modify the slotlist in ACK packet and send ACK to the next node
 
5.  Flow Classifier     
Flow Classifier is a subclass of flowid classifier. It implements a call back Handler so the admission controller will call back to this flow classifier when the new flow path is estabished and install the flow manager on the classifier
 
6. TDM timeslot classifier     
Basically, TDM timeslot classifier classify TDM cells by its incoming     timeslot number. A handler function is implemented for admission     controller to install the core flow manager on corresponding timeslots     when the new flow path is established and clear it when the flow is     released or rejected.
 
7. Flow manager    
The main function of a flow manager in gateway is to segment the IP packet into TDM cells. Then wait for the scheduled timeslot to get the TDM cells from its queue. It also maintain the time for the last received packet. The time information is used by admission controller    for timeout of a flow and release its resource.
 
The function for a flow manager in core switch is to synchronize the sending time of the timeslots which belongs to the flow. There is a timeshift for incoming timeslot and the outgoing timeslot since usually    they're not matched. The timeslots in a flow may have different timeshift    and the flow manager needs to monitor it to make sure the first coming packet also first out.
 
Two flow managers with segmenting function are in core switch and gateway to handle the NonTcp packet and Control signaling segmenting.    Other flow managers are dynamically allocated and deallocated by admission control when a new flow is established or is disconnected.
 
8. Reassembler
The reassembler received TDM cells and according to the IP header information to reassemble TDM cells into a IP packet then pass it to next component.
Two reassemblers are in core switch and gateway to handle the NonTcp    packet and Control signaling segmenting.    Other reassemblers are dynamically allocated and deallocated by    admission control when a new flow is established or is disconnected.
 
9. TDM Link 
A TDM Queue is implemented and connected before the delay link of two nodes.    TDM Queue uses a timer to schedule next timeslot. Each timeslot belongs    to a flow manager. When a timeslot is schedule, the timeslot will get a packet from the flow manager and send it out to TDM Queue. TDM Queue passes the packet to delay link.

 

10   Reference

[1] IP Switching: ATM Under IP, Peter NEwman, Greg Minshall, Tom Lyon, IEEE/ACM Trans
actions on Networking, 6(2), April 1998, pp. 117-129.
[2] Flow Labelled IP: A Connectionless Approach to ATM, Peter Newman, Tom Lyon, Greg Min
shall, IEEE INFOCOM, March 1996.
[3] atm: a Strategy for Integrating IP with ATM, Guru Parulkar, SIGCOMM 1995.
[4] Optical Flow Switching in the NGI, Eytan Modiano, talk at IEEE INFOCOM 2000.
[5] Design, Simulation and Evaluation of TCP Switch, Bo Yang,  Feng Wang, 2000