Why is Packet Size Limited?

The absolute limitation on TCP packet size is 64K (65535 bytes), but in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes.

Why don't we just send one single packet? Why do we need to split content into multiple packets (ignoring the size limit)?
If a lower layer (like the internet layer) has a lower packet size what does this have to do with TCP packet size limitations? A higher layer (than the internet layer) can add as much data as it wants to.

100k 26 26 gold badges 121 121 silver badges 198 198 bronze badges asked Mar 27, 2022 at 23:12 313 2 2 silver badges 4 4 bronze badges

5 Answers 5

Why we don't just send one single packet? why we need to split content into multiple pockets (ignoring size limit).

That would just lead back to circuit-switched networks like the original PSTN (Public Switched Telephone Network). The government funded research into packet-switched networks (result: Internet) to overcome the limitations of circuit-switched networks.

In a circuit-switched network, or what you propose, one caller or packet would monopolize the circuit or path until it is done, not giving anyone or any other process process a chance to use the circuit or path until it is done. Breaking things up into smaller packets means that you can share the circuit among callers or processes. Each IP packet is routed independently, so a packet follows a path to the destination, regardless of the path any other packet took to the same destination. If the path loses a link, then the routers in the path can reroute packets to a different path to the destination, and the sender does not know or care.

The big driver of the government funding was the threat of disaster (including nuclear war, which was a big threat in the 1960s and 1970s). If you are making a call (say to respond to ICBM launches), and the telephone company central office is destroyed, then you lose the call and need to start all over, manually rerouting the call. The same holds true for a giant data packet. If you break things up into smaller packets, and there is an interruption in the path, the rest of the packets can automatically be re-routed around the damage.

So, in the simple case you get to share the circuit or path, and you lose very little in the event of a circuit or path interruption.

If lower layer (like internet) have lower packet size why this has to do with TCP packet size limitations? a higher layer (than internet) can add as much data as it wants to.

TCP takes a stream of data (can be very large) and segments it into PDUs (Protocol Data Units) we call segments. The segments fit into the IP packets, which fit into the data-link protocol frames. TCP is a very large subject, far too large to explain it all in a site like this.

Once you understand the reasons for the different layers in the network stack (abstraction and encapsulation), you will see how that works. Basically, the data-link protocol is responsible for delivering frames in the local network, IP is responsible for delivering packets between networks, and a transport protocol like TCP is responsible for delivering datagrams between host processes on different hosts.

answered Mar 27, 2022 at 23:53 Ron Maupin ♦ Ron Maupin 100k 26 26 gold badges 121 121 silver badges 198 198 bronze badges Comments are not for extended discussion; this conversation has been moved to chat. Commented Mar 29, 2022 at 22:07

"In a circuit-switched network, or what you propose, one caller or packet would monopolize the circuit or path until it is done, " - this is straight out wrong. Circuit-switching does not mean you cannot do multiplexing, and mutliplexing allows several participants to share links. GSM, for example, uses frequency and time multiplexing (well, and space, but that is not relevant here).

Commented Apr 22, 2022 at 10:32

besides, data link layers in circuit-switched networks still have frames and frame sizes, which means splitting data has to happen somewhere and all the same issues actually arise. The reason people don't think of them, is because they think of telephone network, which transmits voice data, and voice data has a very special case handling errors, which influences network design. A circuit-switched network that transmits data reliably cannot work like this. It will have the same fragmentation-reassembly issue as TCP.

Commented Apr 22, 2022 at 11:04

@Effie, you missed the part about, "like the original PSTN." That was a circuit-switched network that did not allow multiplexing. The multiplexing on the PSTN was released in my lifetime. You did not read the comments that were moved to chat. My last two comments explained that. You must be too young to remember that. In any case, you are completely wrong about the original circuit-switched networks.

Commented Apr 22, 2022 at 11:53

your comments that were moved to chat are not accessible. Also, I fail to see how in 2022 it is still relevant that some technology was not capable of doing something around 1960, especially since by 1980 it was. Circuit switched networks do have frame sizes at layer 2 and allow multiplexing. The same questions are equally valid for them too. Same questions would be relevant if TCP was run over a connection-oriented layer 3. Relevant issues are who is doing fragmentation, who is doing reassembly, and how missing fragments are dealt with.

Commented Apr 23, 2022 at 18:45

In general there are several reasons to limit packet size.

A larger packet has a longer transmission duration, which means it ties up the line for longer, increasing jitter for other (potentially higher priority) communication streams.
Most forwarding in packet switched networks is "store and forward", that is when a packet is received by a switch or router it is received in it's entirety before it is transmitted onwards. Thus larger packets encounter more latency and take up more space in the buffers of forwarding devices.
If a packet is damaged or a queue overflows, then typically the whole packet will be lost and will need to be retransmitted. The larger the packet the more bandwidth is wasted retransmitting it.

That said the 1500 byte maximum most of the internet uses today is an anachronism.

If lower layer (like internet) have lower packet size why this has to do with TCP packet size limitations?

Generally in a network stack you want to perform each function once. Splitting a data stream into packets or a packet into smaller packets is relatively cheap, but reassembly is relatively expensive as packets may arrive out of order and packets for multiple data streams may be interspersed.

IP does actually have a mechanism called "fragmentation" which can be used to divide oversized packets from an upper layer into smaller fragments but there are serveral issues with this mechanism. IPv4 fragmentation suffers from the following issues.

If the MTU of a path gradually reduces, then a number of small fragments can be created causing reduced routing efficiency.
Fragments other than the first fragment do not contain L4 headers, this makes life difficult for firewalls and network address translators.
The receiver must go through a reassembly step to turn fragments back into full sized packets, so if IP fragmentation is used in combination with TCP, the receiver must two separate reassembly steps, first it must reassemble the IP fragments into complete packets, then it must reassemble the TCP segments into a data stream.
If any fragment is lost the whole fragmented packet cannot be reassembled and is then lost.
The "identification" field in the IPv4 header is too small to guarantee correct reassembly with a combination of high speed networks and worst case packet lifetimes, fortunately in practice worst case packet lifetimes are rarely encountered so this is not too much of an issue in practice but it's certainly still a design flaw.

IPv6 fragmentation solves some of the issues with IPv4 fragmentation but causes some issues of it's own.

Therefore modern TCP implementations disable IP framentation (by setting the "don't frament" bit in the IP header and manage packet size themselves. Typically when setting up a connection they will advertise a "maximum segment size" based on the MTU of their local interface. Initially when sending packets they will use a maximum sized based on their own interface MTU and the "maximum segment size" value sent by their peer.

Now these packets may still be too big for the underlying network, if so then "path MTU discovery" comes into play. The host will watch for ICMP packets indicating that the MTU has been exceeded and will reduce packet size.

Some implementations also implement "blackhole detection" where they will reduce the packet size if packet delivery appears to be silently failing. This works around networks which fail to succesfully deliver ICMP "packet too big" messages.

So why is the de-facto internet MTU stuck on 1500 bytes?, there are several reasons.

Networks tend to work better if MTUs are not mixed, yes there are mechanisms in TCP/IP to deal with mixed MTUs but they come at the price of increased latency at best and outright failures at worst (particularly when people block ICMP).
Ethernet has no mechanism for negotiating MTU or dealing with oversized packets, so all the machines on an Ethernet segment need to be configured consistently, that is manageable for a high performance computing network where all the machines are tightly controlled but very difficult to manage for end user networks, this feeds into the last point.
You only get the benefit of a larger MTU if the whole end to end network supports it, so there is little point in an ISP increasing their MTU if their customer and provider networks don't support it.

answered Mar 28, 2022 at 19:19 Peter Green Peter Green 13.6k 2 2 gold badges 21 21 silver badges 50 50 bronze badges

Although answer from Ron Maupin is excellent, I'd like to add something.

My short answer is: TCP needs to adjust to lower layers MTU in order to do its job: error control, flow control, congestion control (and more).

Reading your question, I think that you are assuming that each layer splits data in as many chunks as it wants. Say, ethernet sends 1500 B frames, and TCP sends 2 MB segments, and those could be split into as many IP packets as needed, am I right?

However, this is strictly out of the rule, because even though it's feasible in a layered model, and for instance, HTTP usually doesn't care about fragmentation, this is because it sits on top of this transport layer (TCP) that manages, among other things, asking for corrupted or lost segments. This gives a pretty solid base for HTTP to do whatever it wants. HTTP transfers can also be "fragmented" with the "chunked transfer encoding", but it's configured independently of the TCP maximum segment size (MSS).

Coming back to TCP, it has no guarantee that IP will deliver every packet, in order, and that they won't be corrupt, or overflow some reception buffer, or dozens of other situations. Therefore, this "abstraction" that HTTP does over TCP would be reckles and non-sense to do on top of IP, because then how would IP packets be resent, if TCP is not numbering them and counting them one by one? TCP would lose most of its use. Actually, even UDP adapts to lower layers MTU to at least detect corrupt datagrams using a CRC header.

In conclusion, TCP segment size must fit into IP packets size, that in turn must fit into whatever link protocol is using, that in the internet world of today, conveniently converges to the 1500 bytes of an ethernet frame, even if some protocols do strange things like MPLS. (Nobody wants to deal with fragmentation or having to discover arbitrary MTU for each route a packet will eventually take.)

I hope I added something useful! I also learned revisiting this topics.

answered Mar 28, 2022 at 11:26 81 3 3 bronze badges

There are consideration to make the limit higher (efficiency) or lower (reliability, interoperability), discussed in the other answers.

The exact number is related to the state of the technology back then when the protocol was designed (and I think a 64kbyte packet was pretty much optimistic).

If we were to design the same protocol today (most network equipment 32-bit, 1 Gbit/s being a comodity), 1 or even 2 or 4 megabytes would be a good compromise. It we were to make the protocol future-proof, an even higher limit would be legislated.

answered Mar 29, 2022 at 14:44 269 1 1 silver badge 3 3 bronze badges

A 4 megabyte segment/packet/frame would block a 1 Gbps link for 32ms. A real-time protocol might have a hard time competing against one or two such fat streams, I suppose.

Commented Mar 29, 2022 at 17:21 64k over 9600 bps modem is about a minute. Commented Mar 29, 2022 at 17:29

I believe the crucial detail in this discussion is that network is unreliabe. There is no guarantee that data, sent over the network, actually gets delivered as it is. This means that network nodes (it depends on the network network, which nodes these are) have to do error recovery.

I will talk about the case that is relevant to TCP, that is data needs to be delivered as is, i.e., the same bits in the same order. [condition (1)]

Network errors can be bit errors (i.e., some bits change when arrived, 0 instead of 1, or 1 instead of 0) or chunk (think of it as packet at this point) errors (e.g., some chunk is missing, some chunks arrive more than once, chunks can arrive in different order).

First, in order to fix/detect bit errors we use two mechanisms - error-correcting codes and checksums. Error-correcting codes (layer 1) can correct certain number of errors. The idea is that checksum detects if errors are still present, and chunks for which checksum fails are discarded. That is, we are only concerned with chunk errors. As far as I know, the only meaningful way to recover from chunk errors (under condition (1)) are to retransmit the chunk.

So, this brings us to the answer to the first question. If you send one big packet (and the packet is really big) and some data in this packet was received incorrectly, you will need to retransmit the entire packet again. Depending on network technology the probability of each attempt to come incorrectly is pretty large. Thus it makes more sense to split data into chunks, send chunks more or less independently and then only retransmit chunks that were received incorrectly.

Side note: one of the fundamental design decisions of TCP/IP protocol suite is that nodes who do splitting (fragmentation) and reassembly of data should be the sender and the receiver, and not the nodes in between. You can read about in in classical paper "End-To-End Arguments in System Design" (link).

Now, let's go to the second question. To understand this we need to understand how layers interact, and what happens if one layer tries to send larger chunks than it is supported.

As we established digital data transmission sends data in chunks. At layer 2 these chunks are called frames. A layer 2 standard should define minimal and maximal size of the chunks. These sizes should depend on physical characteristics of the medium, but I am not that familiar with this topic to say something definitive. You could probably check classical examples of frame size for CSMA/CD Ethernet as an example.

What happens if the layer 3 has received a packet that is larger than layer 2 chunk. In IPv4, the current processing node (e.g., a router) should split the packet into smaller fragments and send these fragments. Next hop should reassemble the fragments into original larger packet. At this point IP does not do error recovery, i.e., if one of the fragments is not delivered, the whole packet is dropped. Experience has shown that this is very inefficient. Thus IPv6 removed hop-by-hop fragmentation. If this happens, the sender gets a feedback and should split packet and the receiver should reconstruct it.

Now, the main role of TCP is to do error recovery. That is the receiver detects what chunks are missing and notifies the sender, which in turn reconstructs the packet. Now, try to imagine how this could interact with IPv4 fragmentation. TCP splits data in chunks (called segments :)) of size X. Somewhere along the path these chunks can be split in smaller chunks of size Y ( Y < N ), then reconstructed back to size X, which can happen multiple times, and then the receiver needs to reconstruct received chunks of size X into original data. This means the more or less the same functionality must be repeated multiple times along the path. It is more efficient if TCP figures out the minimal Y, and splits data in segments of size Y. Then intermediate nodes do not need to do anything.

Note also, that the fact that IPv4 can do fragmentation would not change the fact that TCP still have to do its "fragmentation" as well, i.e., whether layer 3 does fragmentation or not would not change the required functionality TCP has to provide. On the other hand, for TCP it does not make a lot of difference if the size of the segment is X and Y. More info in the linked paper.

At this point I would like to make comments.

First, I disagree with the explanation of circuit switching. First, in 1960 I believe phone networks were analogue not digital. In 1980, digital phone networks (1) have frames at layer 2, which means that the issue of chunk size is relevant as well (2) are capable of doing time-division multiplexing, which means that several circuits can share the same path. Even before that, frequency-division multiplexing also allowed multiple transmissions to share the same physical path (e.g., radio/TV channels).

The difference between circuit switching and packet switching is the scale at which one can change multiplexing. If someone sends a lot of data, and is actively sending all the time, packet switching will not be better. Usually however it is not the case. For example, TCP sends user input, and user is doing nothing. In circuit switching resources would be reserved and cannot be used for anyone esle. In packet switching, each node can redistribute resources on "send time of one frame" time scale. The same idea would apply if TCP sender pauses sending because receiver cannot process packets (see flow control).

Second, phone network transmits real-time audio data. Transmitting real-time audio data is very different from TCP. In particular condition (1) does not apply. On one hand it is ok if audio chunks arrive not 100% correct, on the other hand error recovery in form of retransmitting chunks cannot be done (you can read more about restransmission problems here). The issue that I want to point out is following: in a primitive phone network, your phone will continuously transmit whatever your microphone recorded on regular intervals. Thus you will actually use all resources in the extablished circuit. In this case packet switching is actually worse then circuit switching. The reason, why is packet switching better, is because you have microphones that can actually detect if you are not saying anything, and not send anything in this case.