Netty 零拷贝实现

说起零拷贝，很多人只知道 sendfile。但 Netty 的零拷贝远不止 sendfile，它还有很多精巧的设计来减少数据拷贝。

我之前排查过一个线上问题：文件传输服务吞吐量上不去，CPU 也高，profiler 显示大量时间花在数组拷贝上。后来改用 Netty 的零拷贝功能，性能直接翻倍。

今天我们来看看 Netty 是怎么实现零拷贝的。

一、Netty 零拷贝的整体设计

1.1 零拷贝的层次

Netty 的零拷贝可以从四个层次理解：

层次	实现	说明
IO 层	sendfile + FileRegion	文件到网络零拷贝
内存层	DirectBuffer + Pool	减少 GC，减少拷贝
组合层	CompositeByteBuf	多个 Buffer 组合
视图层	slice() + duplicate()	共享底层数据

1.2 与 JDK NIO 的对比

JDK NIO 的 ByteBuffer 只有一个连续的内存区域，而 Netty 的 ByteBuf 提供了更灵活的操作：

JDK ByteBuffer:
┌────────────────────────────────┐
│         单一连续区域            │
│  ┌──────────────────────────┐ │
│  │                          │ │
│  └──────────────────────────┘ │
└────────────────────────────────┘

Netty ByteBuf:
┌────────────────────────────────┐
│     可以由多个区域组成          │
│  ┌────┐ ┌────────┐ ┌───────┐  │
│  │ A  │ │   B    │ │   C   │  │
│  └────┘ └────────┘ └───────┘  │
└────────────────────────────────┘

这种设计让 Netty 可以"拼接"多个 ByteBuf 而不需要拷贝数据。

二、CompositeByteBuf：组合 Buffer

2.1 为什么需要组合 Buffer

想象一个 HTTP 响应：

HTTP Header:   "HTTP/1.1 200 OK\r\nContent-Length: 1000\r\n\r\n"
HTTP Body:     1000 字节的数据

如果用传统方式拼接：

// 传统方式：需要拷贝
ByteBuffer header = ByteBuffer.wrap(headerBytes);
ByteBuffer body = ByteBuffer.wrap(bodyBytes);

// 需要把两个 Buffer 拷贝到一个新的 Buffer 中
ByteBuffer combined = ByteBuffer.allocate(header.remaining() + body.remaining());
combined.put(header);
combined.put(body);

2.2 CompositeByteBuf 的使用

Netty 的 CompositeByteBuf 可以把多个 ByteBuf 组合成一个逻辑上的整体：

// 创建 CompositeByteBuf
CompositeByteBuf buf = Unpooled.compositeBuffer();

// 添加多个组件（不拷贝数据！）
buf.addComponents(header, body);

// 像操作一个 Buffer 一样
ByteBuffer nioBuffer = buf.nioBuffer();

2.3 源码解析

public class CompositeByteBuf extends AbstractReferenceCountedByteBuf 
    implements ByteBuf {
    
    // 内部由多个 ByteBuf 组成
    private final ByteBuf[] components;
    private final int maxNumComponents;
    
    // 添加组件
    public CompositeByteBuf addComponents(ByteBuf... buffers) {
        addComponents0(false, buffers);
        return this;
    }
    
    private CompositeByteBuf addComponents0(boolean increaseIndex, ByteBuf... buffers) {
        for (ByteBuf buffer : buffers) {
            if (buffer == null) continue;
            
            // 只记录组件引用，不拷贝数据
            components = insertComp(components, cId, buffer);
            
            // 更新读写索引
            if (increaseIndex) {
                setIndex(readerIndex(), writerIndex() + buffer.readableBytes());
            }
        }
        return this;
    }
}

2.4 CompositeByteBuf 的读写

CompositeByteBuf buf = Unpooled.compositeBuffer();
buf.addComponents(
    Unpooled.wrappedBuffer(headerBytes),  // Header
    Unpooled.wrappedBuffer(bodyBytes)       // Body
);

System.out.println(buf.readableBytes());  // 总大小

// 可以像普通 Buffer 一样读取
int headerLen = buf.readInt();
String header = buf.readBytes(headerLen).toString(UTF_8);

三、ByteBuf.wrap：包装已有数组

3.1 wrap vs copy

byte[] bytes = "Hello".getBytes();

// wrap：不拷贝，直接引用
ByteBuf wrapped = Unpooled.wrappedBuffer(bytes);
wrapped.writeByte(100);  // 修改会影响原数组！

// copy：创建副本
ByteBuf copied = Unpooled.copiedBuffer(bytes);
copied.writeByte(100);  // 不影响原数组

3.2 wrap 的源码

// Unpooled.wrappedBuffer
public static ByteBuf wrappedBuffer(byte[] array) {
    return wrappedBuffer(array, 0, array.length);
}

public static ByteBuf wrappedBuffer(byte[] array, int offset, int length) {
    if (array.length == 0) {
        return EMPTY_BUFFER;
    }
    // 返回一个包装了数组的 HeapByteBuf
    return new TooManyComponentsException().enableSuppression(false);
    // 实际上返回的是 new ReadOnlyByteBufferBuf(...)
}

// 包装后，数据并不拷贝
public class HeapByteBuf extends AbstractByteBuf {
    private final byte[] array;
    private final int offset;
    private final int length;
    
    public HeapByteBuf(byte[] array, int offset, int length) {
        this.array = array;
        this.offset = offset;
        this.length = length;
    }
    
    @Override
    public byte getByte(int index) {
        return array[index + offset];
    }
}

3.3 实用场景

// 场景1：HTTP 响应
ByteBuf header = Unpooled.wrappedBuffer(headerBytes);
ByteBuf body = Unpooled.wrappedBuffer(bodyBytes);
ByteBuf response = Unpooled.wrappedBuffer(header, body);

// 场景2：多个小消息合并发送
List<ByteBuf> messages = ...;
CompositeByteBuf combined = Unpooled.compositeBuffer();
for (ByteBuf msg : messages) {
    combined.addComponent(msg);
}
channel.writeAndFlush(combined);

四、slice 和 duplicate：视图操作

4.1 slice：分割 Buffer

slice() 可以把一个 Buffer 的某部分切出来，形成一个"视图"：

ByteBuf buf = Unpooled.buffer(10);
buf.writeBytes("HelloWorld".getBytes());

// 切出前 5 个字节
ByteBuf slice = buf.slice(0, 5);

System.out.println(slice.toString(UTF_8));  // "Hello"

// slice 和原 Buffer 共享底层数据
slice.setByte(0, 'h');  // 修改会影响到原 Buffer
System.out.println(buf.toString(UTF_8));    // "helloWorld"

4.2 slice 的使用场景

// 场景：处理 HTTP 分块传输
ByteBuf chunk = Unpooled.buffer();
client.read(chunk);

// 提取头部信息（前4字节是长度）
ByteBuf lengthPart = chunk.slice(0, 4);
int bodyLength = lengthPart.readInt();

// 提取实际内容
ByteBuf body = chunk.slice(4, bodyLength);
process(body);

// 不需要拷贝，直接共享数据

4.3 duplicate：完整复制视图

duplicate() 创建整个 Buffer 的完整视图：

ByteBuf buf = Unpooled.buffer(10);
buf.writeBytes("Hello".getBytes());

// duplicate 创建完整视图
ByteBuf dup = buf.duplicate();

// 修改 dup 会影响原 buf
dup.setByte(0, 'h');
System.out.println(buf.getByte(0));  // 'h'

// 修改 buf 也会影响 dup
buf.setByte(0, 'H');
System.out.println(dup.getByte(0));  // 'H'

4.4 切片与引用计数

切片后的 Buffer 和原 Buffer 共享引用计数：

ByteBuf buf = Unpooled.buffer(10);
buf.retain();  // 引用计数 +1

ByteBuf slice = buf.slice(0, 5);
// slice 不增加引用计数

// slice 释放
slice.release();  // 引用计数 -1
System.out.println(buf.refCnt());  // 1，还没有被释放

// slice retain 后独立
ByteBuf retainedSlice = buf.slice(0, 5).retain();
retrainedSlice.release();  // 释放 retainedSlice
buf.release();  // 释放原 buf

五、FileRegion：文件零拷贝

5.1 FileRegion 的作用

FileRegion 用于文件传输，底层使用 sendfile：

// 使用 FileRegion 发送文件
FileInputStream in = new FileInputStream("bigfile.zip");
FileChannel channel = in.getChannel();

// FileRegion 封装了 sendfile
FileRegion region = new DefaultFileRegion(channel, 0, channel.size());

// 直接写到 Channel，不经过用户空间
ctx.write(region);
ctx.flush();

5.2 源码解析

public class DefaultFileRegion extends AbstractReferenceCounted 
    implements FileRegion {
    
    private final FileChannel channel;
    private final long position;
    private final long count;
    
    @Override
    public long transferTo(WritableByteChannel target, long position) throws IOException {
        long written = channel.transferTo(position, count, target);
        // 直接调用 FileChannel.transferTo
        // 底层使用 Linux sendfile
        return written;
    }
}

5.3 vs 普通文件发送

// ❌ 普通方式：数据经过用户空间
FileInputStream in = new FileInputStream("bigfile.zip");
FileChannel channel = in.getChannel();
ByteBuffer buf = ByteBuffer.allocate(1024);

// 数据：磁盘 → 内核缓冲区 → 用户缓冲区 → Socket缓冲区 → 网卡
while (channel.read(buf) != -1) {
    buf.flip();
    socketChannel.write(buf);
    buf.clear();
}

// ✅ FileRegion 方式：数据不经过用户空间
FileRegion region = new DefaultFileRegion(channel, 0, channel.size());
ctx.write(region);
// 数据：磁盘 → 内核缓冲区 → 网卡（零拷贝）

六、PooledByteBufAllocator：池化内存

6.1 池化的必要性

每次分配 ByteBuf 都要：

分配内存
使用后垃圾回收

对于高性能场景，这个开销不可忽视。

6.2 池化原理

Netty 的 PooledByteBufAllocator 使用对象池复用 ByteBuf：

// 默认使用池化分配器
ByteBuf buf = PooledByteBufAllocator.DEFAULT.buffer(1024);

// 释放时放回池中，而不是 GC
buf.release();

// 下次分配可能直接拿到池中的对象
ByteBuf newBuf = PooledByteBufAllocator.DEFAULT.buffer(1024);  // 复用！

6.3 池化 vs 非池化性能对比

分配次数：

非池化（每次都 new）：
  alloc → GC → alloc → GC → alloc → ...

池化（复用）：
  alloc → release → reuse → release → reuse → ...

内存使用：
  非池化：占用持续增长，直到 GC
  池化：维持稳定大小

6.4 DirectMemory 的优势

// JVM 参数
// -XX:MaxDirectMemorySize=1g

// Netty 默认优先使用直接内存
ByteBuf directBuf = PooledByteBufAllocator.DEFAULT.directBuffer(1024);

// 直接内存优势：
// 1. 减少一次堆内到堆外的拷贝
// 2. 不受 GC 影响
// 3. 更适合网络 IO

七、【直观类比】

【直观类比】

把 Netty 的零拷贝操作比作乐高积木：

操作	比喻
allocate()	买一块新的乐高积木
wrap()	拿一个透明盒子把积木装起来，积木还是那块积木
slice()	把积木从中间掰成两半，两部分还是同一块积木的不同部分
composite()	把多块积木用透明胶带缠在一起，还是独立的积木
copy()	把积木复制一份，完全独立

wrap、slice、composite 都不需要"造新积木"，只是改变观察和组合的方式。

八、生产避坑

8.1 ❌ 错误示范：修改了被 wrap 的数组

byte[] sensitiveData = getPassword();
ByteBuf buf = Unpooled.wrappedBuffer(sensitiveData);

// ❌ 如果这里被恶意修改，原数组也被改了
// buf.setByte(0, (byte) 'X');

// ✅ 如果数据敏感，应该 copy
ByteBuf safeBuf = Unpooled.copiedBuffer(sensitiveData);

8.2 ❌ 错误示范：忘记 release CompositeByteBuf

CompositeByteBuf buf = Unpooled.compositeBuffer();
buf.addComponents(component1, component2, component3);

channel.writeAndFlush(buf);

// ❌ 忘记 release

正确做法：确保最终释放：

// 使用 ReferenceCountUtil.releaseIfNecessary
try {
    CompositeByteBuf buf = Unpooled.compositeBuffer();
    buf.addComponents(component1, component2, component3);
    channel.writeAndFlush(buf);
} finally {
    // 如果 writeAndFlush 已经释放了，这行可以省略
    // ReferenceCountUtil.releaseIfNecessary(buf);
}

8.3 ❌ 错误示范：slice 后修改原 Buffer

ByteBuf buf = Unpooled.buffer(10);
buf.writeBytes("Hello".getBytes());

ByteBuf slice = buf.slice(0, 5);

// ❌ 丢弃已读部分可能影响 slice
buf.discardReadBytes();  // 可能导致 slice 内容变化

System.out.println(slice.toString());  // 内容可能变了

九、面试追问链

第一层：零拷贝的概念

面试官问："Netty 的零拷贝是怎么实现的？"

Netty 的零拷贝包括多个层面：CompositeByteBuf 把多个 Buffer 组合成一个逻辑整体，不需要拷贝数据；wrap/slice/duplicate 创建视图而不是副本；FileRegion 使用 sendfile 做文件传输；池化 DirectMemory 减少 GC 和拷贝。

第二层：CompositeByteBuf

面试官追问："CompositeByteBuf 和普通 ByteBuf 有什么区别？"

普通 ByteBuf 是一整块连续内存。CompositeByteBuf 由多个 ByteBuf 组成，可以像一整块一样使用，但内部是分散的。添加组件不会拷贝数据，只是记录引用。

第三层：slice vs duplicate

面试官追问："slice 和 duplicate 有什么不同？"

slice 切出部分区域，duplicate 复制整个区域。它们都是视图，共享底层数据。slice 需要指定范围，duplicate 是完整的副本。

第四层：sendfile

面试官追问："Netty 的 FileRegion 为什么能做到零拷贝？"

FileRegion 底层调用 FileChannel.transferTo()，对应 Linux 的 sendfile 系统调用。数据直接从文件缓冲区传输到 Socket 缓冲区，不经过用户空间，只有 2 次 DMA 拷贝。

【学习小结】

CompositeByteBuf：组合多个 Buffer，不拷贝数据
wrap：包装已有数组，不拷贝
slice：切分 Buffer，共享底层
duplicate：复制完整视图
FileRegion：sendfile 零拷贝
PooledByteBuf：池化内存，减少分配开销
使用零拷贝时要小心引用计数和数据安全

#Netty 零拷贝实现

#一、Netty 零拷贝的整体设计

#1.1 零拷贝的层次

#1.2 与 JDK NIO 的对比

#二、CompositeByteBuf：组合 Buffer

#2.1 为什么需要组合 Buffer

#2.2 CompositeByteBuf 的使用

#2.3 源码解析

#2.4 CompositeByteBuf 的读写

#三、ByteBuf.wrap：包装已有数组

#3.1 wrap vs copy

#3.2 wrap 的源码

#3.3 实用场景

#四、slice 和 duplicate：视图操作

#4.1 slice：分割 Buffer

#4.2 slice 的使用场景

#4.3 duplicate：完整复制视图

#4.4 切片与引用计数

#五、FileRegion：文件零拷贝

#5.1 FileRegion 的作用

#5.2 源码解析

#5.3 vs 普通文件发送

#六、PooledByteBufAllocator：池化内存

#6.1 池化的必要性

#6.2 池化原理

#6.3 池化 vs 非池化性能对比

#6.4 DirectMemory 的优势

#七、【直观类比】

#八、生产避坑

#8.1 ❌ 错误示范：修改了被 wrap 的数组

#8.2 ❌ 错误示范：忘记 release CompositeByteBuf

#8.3 ❌ 错误示范：slice 后修改原 Buffer

#九、面试追问链

#第一层：零拷贝的概念

#第二层：CompositeByteBuf

#第三层：slice vs duplicate

#第四层：sendfile