From 44f5324b5d13ef2187729d949eca442689627f39 Mon Sep 17 00:00:00 2001 From: Jerry Chu Date: Tue, 25 Jan 2011 13:46:30 -0800 Subject: TCP: fix a bug that triggers large number of TCP RST by mistake This patch fixes a bug that causes TCP RST packets to be generated on otherwise correctly behaved applications, e.g., no unread data on close,..., etc. To trigger the bug, at least two conditions must be met: 1. The FIN flag is set on the last data packet, i.e., it's not on a separate, FIN only packet. 2. The size of the last data chunk on the receive side matches exactly with the size of buffer posted by the receiver, and the receiver closes the socket without any further read attempt. This bug was first noticed on our netperf based testbed for our IW10 proposal to IETF where a large number of RST packets were observed. netperf's read side code meets the condition 2 above 100%. Before the fix, tcp_data_queue() will queue the last skb that meets condition 1 to sk_receive_queue even though it has fully copied out (skb_copy_datagram_iovec()) the data. Then if condition 2 is also met, tcp_recvmsg() often returns all the copied out data successfully without actually consuming the skb, due to a check "if ((chunk = len - tp->ucopy.len) != 0) {" and "len -= chunk;" after tcp_prequeue_process() that causes "len" to become 0 and an early exit from the big while loop. I don't see any reason not to free the skb whose data have been fully consumed in tcp_data_queue(), regardless of the FIN flag. We won't get there if MSG_PEEK is on. Am I missing some arcane cases related to urgent data? Signed-off-by: H.K. Jerry Chu Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/ipv4/tcp_input.c') diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2549b29b062d..eb7f82ebf4a3 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4399,7 +4399,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) if (!skb_copy_datagram_iovec(skb, 0, tp->ucopy.iov, chunk)) { tp->ucopy.len -= chunk; tp->copied_seq += chunk; - eaten = (chunk == skb->len && !th->fin); + eaten = (chunk == skb->len); tcp_rcv_space_adjust(sk); } local_bh_disable(); -- cgit v1.2.3 From c24f691b56107feeba076616982093ee2d3c8fb5 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Mon, 7 Feb 2011 12:57:04 +0000 Subject: tcp: undo_retrans counter fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix a bug that undo_retrans is incorrectly decremented when undo_marker is not set or undo_retrans is already 0. This happens when sender receives more DSACK ACKs than packets retransmitted during the current undo phase. This may also happen when sender receives DSACK after the undo operation is completed or cancelled. Fix another bug that undo_retrans is incorrectly incremented when sender retransmits an skb and tcp_skb_pcount(skb) > 1 (TSO). This case is rare but not impossible. Signed-off-by: Yuchung Cheng Acked-by: Ilpo Järvinen Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 5 +++-- net/ipv4/tcp_output.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'net/ipv4/tcp_input.c') diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index eb7f82ebf4a3..65f6c0406245 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1222,7 +1222,7 @@ static int tcp_check_dsack(struct sock *sk, struct sk_buff *ack_skb, } /* D-SACK for already forgotten data... Do dumb counting. */ - if (dup_sack && + if (dup_sack && tp->undo_marker && tp->undo_retrans && !after(end_seq_0, prior_snd_una) && after(end_seq_0, tp->undo_marker)) tp->undo_retrans--; @@ -1299,7 +1299,8 @@ static u8 tcp_sacktag_one(struct sk_buff *skb, struct sock *sk, /* Account D-SACK for retransmitted packet. */ if (dup_sack && (sacked & TCPCB_RETRANS)) { - if (after(TCP_SKB_CB(skb)->end_seq, tp->undo_marker)) + if (tp->undo_marker && tp->undo_retrans && + after(TCP_SKB_CB(skb)->end_seq, tp->undo_marker)) tp->undo_retrans--; if (sacked & TCPCB_SACKED_ACKED) state->reord = min(fack_count, state->reord); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 406f320336e6..dfa5beb0c1c8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2162,7 +2162,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) if (!tp->retrans_stamp) tp->retrans_stamp = TCP_SKB_CB(skb)->when; - tp->undo_retrans++; + tp->undo_retrans += tcp_skb_pcount(skb); /* snd_nxt is stored to detect loss of retransmitted segment, * see tcp_input.c tcp_sacktag_write_queue(). -- cgit v1.2.3