We use the idea of copy on write to encapsulate NSMutableData to understand how our standard library is implemented.

All basic collection types provided in the standard library are value types, and their efficiency is guaranteed by the idea of copy-on-write. Collection types are the most commonly used data types, so it is important to understand its performance characteristics. Let's take a look at how copy-on-write works and try to implement one manually.

reference type

As an example, let's compare Swift's Data (struct) with the NSMutableData (class) in the Foundation library. First we initialize the NSMutableData instance with some bytes of data.

 var sampleBytes: [UInt8] = [0x0b,0xad,0xf0,0x0d]
let nsData = NSMutableData(bytes: sampleBytes, length: sampleBytes.count)

We used let to declare nsData, but reference types like NSMutableData are not controlled by let/var. For reference types, the let statement means that the pointer nsData cannot point to other memory, but the data in the memory it points to can be changed. That is to say, we can still append data to nsData.

 nsData.append(sampleBytes, length: sampleBytes.count)

When we declare another object and change one of the objects, the other object will also change.

 let nsOtherData = nsData
nsData.append(sampleBytes, length: sampleBytes.count)
// nsOtherData 也会变

If we want to make an independent copy, we need to use mutableCopy (which returns an Any type), and we need to cast the return value to the NSMutableData type we need.

 let nsOtherData = nsData.mutableCopy() as! NSMutableData
nsData.append(sampleBytes, length: sampleBytes.count)
// nsOtherData 不变

value type

First, we also initialize a Data through sampleBytes.

 let data = Data(bytes: sampleBytes, count: sampleBytes.count)

If we used the let keyword, the compiler would not allow us to call methods like append on the type. So if you want to change the value of data, use var.

 var data = Data(bytes: sampleBytes, count: sampleBytes.count)
data.append(contentsOf: sampleBytes)

The main difference between Data and NSData is that when assigning a value to another variable or passing it as a parameter to a method, Data always makes a new copy, but NSData only makes a new reference, but two The reference points to the same memory area.

When we create a copy of Data, all its fields will be copied, but not immediately, because the Data memory has a reference to the actual memory space, so when the structure is copied, it will only generate a new The actual data will be copied only if we modify the data to this new reference.

Implement copy-on-write

We implement a Data type ourselves to help us understand how copy-on-write works, and we use NSMutableData internally to actually store data (just for faster completion, the actual Data must be stored internally using a lower-level data structure data). Methods to change data We only implement an append method.

 struct MyData {
    var data = NSMutableData()
    
    func append(_ bytes: [UInt8]) {
        data.append(bytes, length: bytes.count)
    }
}

We can create a MyData

 let data = MyData()

In order to better print out the data stored in data, we can make MyData implement the CustomDebugStringConvertible protocol.

 extension MyData: CustomDebugStringConvertible {
    var debugDescription: String {
        return String(describing: data)
    }
}

Now we can call the append method.

 data.append(sampleBytes)

But this is problematic. First of all, our MyData is a structure, and we use let to create data. We should not be able to modify its value.

And looking at the code below, his copy behavior is also problematic, when we declare a new reference, we don't get a completely independent copy.

 var copy = data
copy.append(sampleBytes)

print(data)
print(copy)
// copy 调用 append, data 也会改变

So although we create a structure, it does not show value semantics.

At present, when we assign data to a new variable, although all fields are copied, the data in our MyData is an NSMutableData reference type, so the values of the two variables data and copy now contain the same A reference to an NSMutableData instance.

To solve this problem, we first deal with the 'on-write' problem of copy-on-write. When we call the append method to add data, we need to make a deep copy of the data that is actually stored internally. At this time, our append method must add the mutating keyword, otherwise the compiler will not allow to modify the variables of the structure.

 struct MyData {
    var data = NSMutableData()
    
    mutating func append(_ bytes: [UInt8]) {
        print("make a copy")
        data = data.mutableCopy() as! NSMutableData
        data.append(bytes, length: bytes.count)
    }
}

Now we have to regenerate a data of type var to call the append method, because the compiler does not allow calls of type let to call methods with the mutating keyword.

 var data = MyData()
var copy = data
copy.append(sampleBytes)

Before we go any further, do a small refactoring and extract the code that makes a copy of the NSMutableData instance into a separate property.

 struct MyData {
    var data = NSMutableData()
    var dataForWriting: NSMutableData {
        mutating get {
            print("make a copy")
            data = data.mutableCopy() as! NSMutableData
            return data
        }
    }
    
    mutating func append(_ bytes: [UInt8]) {
        dataForWriting.append(bytes, length: bytes.count)
    }
}

Make copy-on-write more efficient

Currently our copy-on-write is very simple, that is, every time we call append, it will copy, whether we are the only holder of this instance or not.

 for _ in 0..<10 {
    data.append(sampleBytes)
}
// making a copy 会打印10次

In fact, what really needs to be copied is when we assign data to another variable and then call the append method. Because there are two references at this time, a deep copy is required. When the copy is over, the two references point to a completely independent backup, so there is no need to copy it when it is called again.

So our MyData structure is fine, but multiple copies will degrade performance. We can use the isKnownUniquelyReferenced method to help us achieve the desired effect.

 var dataForWriting: NSMutableData {
    mutating get {
        if isKnownUniquelyReferenced(&data) {
            return data
        }
        print("make a copy")
        data = data.mutableCopy() as! NSMutableData
        return data
    }
}

Although we have added the isKnownUniquelyReferenced check now, but running the test code will still copy many times, that is because the isKnownUniquelyReferenced method only has an effect on Swift types. If it is an incoming object of OC type, it will always return false, so we This data type should be wrapped with a Swift type.

 final class Box<A> {
    let unbox: A
    init(_ value: A) {
        self.unbox = value
    }
}

We use this Box class to wrap NSMutableData, and finally our MyData becomes the following

 struct MyData {
    var data = Box(NSMutableData())
    var dataForWriting: NSMutableData {
        mutating get {
            if isKnownUniquelyReferenced(&data) {
                return data.unbox
            }
            print("make a copy")
            data = Box(data.unbox.mutableCopy() as! NSMutableData)
            return data.unbox
        }
    }
    
    mutating func append(_ bytes: [UInt8]) {
        dataForWriting.append(bytes, length: bytes.count)
    }
}

Now our code only copies the NSMutableData instance once.

 var data = MyData()
var copy = data
for _ in 0..<10 {
    data.append(sampleBytes)
}
// Prints:
// making a copy 一次

The implementation of arrays and dictionaries in the standard library is actually similar, but they use lower-level data structures to store them. We manually implement copy-on-write in this way, which helps us better understand their internal performance.

Note on copy-on-write

Copy-on-write is very efficient, but it is not suitable for all scenarios. For example, our for loop above is possible, but if we use reduce to implement the above loop, it will not work.

 (0..<10).reduce(data) { result, _ in
    var copy = result
    copy.append(sampleBytes)
    return copy
}

This implementation makes 10 copies because when we call append, there are always two variables -- copy and result -- that refer to the same instance.

So we should pay attention to the places in our code where there are a lot of unnecessary copies of the product, but we generally don't write that, so it's not a big problem.


Sunxb
83 声望330 粉丝