During tls_sw_sendmsg_locked, we pre-allocate the encrypted message for the size we're expecting to send during the current iteration, but we may end up sending less, for example when splicing: if we're getting the data from small fragments of memory, we may fill up all the slots in the skmsg with less data than expected. In this case, we need to trim the encrypted message to only the length we actually need, to avoid pushing uninitialized bytes down the underlying TCP socket. Fixes: fe1e81d4f73b ("tls/sw: Support MSG_SPLICE_PAGES") Reported-by: Jann Horn Signed-off-by: Sabrina Dubroca --- net/tls/tls_sw.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index daac9fd4be7e..36ca3011ab87 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1112,8 +1112,11 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, goto send_end; tls_ctx->pending_open_record_frags = true; - if (sk_msg_full(msg_pl)) + if (sk_msg_full(msg_pl)) { full_record = true; + sk_msg_trim(sk, msg_en, + msg_pl->sg.size + prot->overhead_size); + } if (full_record || eor) goto copied; -- 2.51.0 If we hit an error during the main loop of tls_sw_sendmsg_locked (eg failed allocation), we jump to send_end and immediately return. Previous iterations may have queued async encryption requests that are still pending. We should wait for those before returning, as we could otherwise be reading from memory that userspace believes we're not using anymore, which would be a sort of use-after-free. This is similar to what tls_sw_recvmsg already does: failures during the main loop jump to the "wait for async" code, not straight to the unlock/return. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn Signed-off-by: Sabrina Dubroca --- net/tls/tls_sw.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 36ca3011ab87..1478d515badc 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1054,7 +1054,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, if (ret == -EINPROGRESS) num_async++; else if (ret != -EAGAIN) - goto send_end; + goto end; } } @@ -1226,8 +1226,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, goto alloc_encrypted; } +send_end: if (!num_async) { - goto send_end; + goto end; } else if (num_zc || eor) { int err; @@ -1245,7 +1246,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, tls_tx_records(sk, msg->msg_flags); } -send_end: +end: ret = sk_stream_error(sk, msg->msg_flags, ret); return copied > 0 ? copied : ret; } -- 2.51.0 When userspace wants to send a non-DATA record (via the TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a previous MSG_MORE send() as a separate DATA record. If that DATA record is encrypted asynchronously, tls_handle_open_record will return -EINPROGRESS. This is currently treated as an error by tls_process_cmsg, and it will skip setting record_type to the correct value, but the caller (tls_sw_sendmsg_locked) handles that return value correctly and proceeds with sending the new message with an incorrect record_type (DATA instead of whatever was requested in the cmsg). Always set record_type before handling the open record. If tls_handle_open_record returns an error, record_type will be ignored. If it succeeds, whether with synchronous crypto (returning 0) or asynchronous (returning -EINPROGRESS), the caller will proceed correctly. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn Signed-off-by: Sabrina Dubroca --- net/tls/tls_main.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index a3ccb3135e51..39a2ab47fe72 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -255,12 +255,9 @@ int tls_process_cmsg(struct sock *sk, struct msghdr *msg, if (msg->msg_flags & MSG_MORE) return -EINVAL; - rc = tls_handle_open_record(sk, msg->msg_flags); - if (rc) - return rc; - *record_type = *(unsigned char *)CMSG_DATA(cmsg); - rc = 0; + + rc = tls_handle_open_record(sk, msg->msg_flags); break; default: return -EINVAL; -- 2.51.0 Async decryption calls tls_strp_msg_hold to create a clone of the input skb to hold references to the memory it uses. If we fail to allocate that clone, proceeding with async decryption can lead to various issues (UAF on the skb, writing into userspace memory after the recv() call has returned). In this case, wait for all pending decryption requests. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Reported-by: Jann Horn Signed-off-by: Sabrina Dubroca --- net/tls/tls_sw.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 1478d515badc..e3d852091e7a 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1641,8 +1641,10 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov, if (unlikely(darg->async)) { err = tls_strp_msg_hold(&ctx->strp, &ctx->async_hold); - if (err) - __skb_queue_tail(&ctx->async_hold, darg->skb); + if (err) { + err = tls_decrypt_async_wait(ctx); + darg->async = false; + } return err; } -- 2.51.0 With async crypto, we rely on tx_work to actually transmit records once encryption completes. But while send() is running, both the tx_lock and socket lock are held, so tx_work_handler cannot process the queue of encrypted records, and simply reschedules itself. During a large send(), this could last a long time, and use a lot of memory. Transmit any pending encrypted records before restarting the main loop of tls_sw_sendmsg_locked. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn Signed-off-by: Sabrina Dubroca --- Paolo suggests that we could also add memory accounting to sk_msg_alloc, and reclaim that memory just before pushing the data to tcp_sendmsg_locked. net/tls/tls_sw.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index e3d852091e7a..d17135369980 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1152,6 +1152,13 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, } else if (ret != -EAGAIN) goto send_end; } + + /* Transmit if any encryptions have completed */ + if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { + cancel_delayed_work(&ctx->tx_work.work); + tls_tx_records(sk, msg->msg_flags); + } + continue; rollback_iter: copied -= try_to_copy; @@ -1207,6 +1214,12 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, goto send_end; } } + + /* Transmit if any encryptions have completed */ + if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { + cancel_delayed_work(&ctx->tx_work.work); + tls_tx_records(sk, msg->msg_flags); + } } continue; -- 2.51.0 We don't have a test to check that MSG_MORE won't let us merge records of different types across sendmsg calls. Add new tests that check: - MSG_MORE is only allowed for DATA records - a pending DATA record gets closed and pushed before a non-DATA record is processed Signed-off-by: Sabrina Dubroca --- tools/testing/selftests/net/tls.c | 34 +++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index e788b84551ca..77e4e30b46cc 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -564,6 +564,40 @@ TEST_F(tls, msg_more) EXPECT_EQ(memcmp(buf, test_str, send_len), 0); } +TEST_F(tls, cmsg_msg_more) +{ + char *test_str = "test_read"; + char record_type = 100; + int send_len = 10; + + /* we don't allow MSG_MORE with non-DATA records */ + EXPECT_EQ(tls_send_cmsg(self->fd, record_type, test_str, send_len, + MSG_MORE), -1); + EXPECT_EQ(errno, EINVAL); +} + +TEST_F(tls, msg_more_then_cmsg) +{ + char *test_str = "test_read"; + char record_type = 100; + int send_len = 10; + char buf[10 * 2]; + int ret; + + EXPECT_EQ(send(self->fd, test_str, send_len, MSG_MORE), send_len); + EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_DONTWAIT), -1); + + ret = tls_send_cmsg(self->fd, record_type, test_str, send_len, 0); + EXPECT_EQ(ret, send_len); + + /* initial DATA record didn't get merged with the non-DATA record */ + EXPECT_EQ(recv(self->cfd, buf, send_len * 2, 0), send_len); + + EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, record_type, + buf, sizeof(buf), MSG_WAITALL), + send_len); +} + TEST_F(tls, msg_more_unsent) { char const *test_str = "test_read"; -- 2.51.0 We don't have a test triggering a partial splice caused by a full skmsg. Add one, based on a program by Jann Horn. Use MAX_FRAGS=48 to make sure the skmsg will be full for any allowed value of CONFIG_MAX_SKB_FRAGS (17..45). Signed-off-by: Sabrina Dubroca --- tools/testing/selftests/net/tls.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 77e4e30b46cc..5c6d8215021c 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -946,6 +946,37 @@ TEST_F(tls, peek_and_splice) EXPECT_EQ(memcmp(mem_send, mem_recv, send_len), 0); } +#define MAX_FRAGS 48 +TEST_F(tls, splice_short) +{ + struct iovec sendchar_iov; + char read_buf[0x10000]; + char sendbuf[0x100]; + char sendchar = 'S'; + int pipefds[2]; + int i; + + sendchar_iov.iov_base = &sendchar; + sendchar_iov.iov_len = 1; + + memset(sendbuf, 's', sizeof(sendbuf)); + + ASSERT_GE(pipe2(pipefds, O_NONBLOCK), 0); + ASSERT_GE(fcntl(pipefds[0], F_SETPIPE_SZ, (MAX_FRAGS + 1) * 0x1000), 0); + + for (i = 0; i < MAX_FRAGS; i++) + ASSERT_GE(vmsplice(pipefds[1], &sendchar_iov, 1, 0), 0); + + ASSERT_EQ(write(pipefds[1], sendbuf, sizeof(sendbuf)), sizeof(sendbuf)); + + EXPECT_EQ(splice(pipefds[0], NULL, self->fd, NULL, MAX_FRAGS + 0x1000, 0), + MAX_FRAGS + sizeof(sendbuf)); + EXPECT_EQ(recv(self->cfd, read_buf, sizeof(read_buf), 0), MAX_FRAGS + sizeof(sendbuf)); + EXPECT_EQ(recv(self->cfd, read_buf, sizeof(read_buf), MSG_DONTWAIT), -1); + EXPECT_EQ(errno, EAGAIN); +} +#undef MAX_FRAGS + TEST_F(tls, recvmsg_single) { char const *test_str = "test_recvmsg_single"; -- 2.51.0