Classic Backtracking Algorithm: Set Partition Problem

After reading this article, you not only learned the algorithm routines, but you can also go to LeetCode to win the following topics:

698. Divide into k equal subsets (medium)

-----------

I said before that the backtracking algorithm is the best algorithm to use in the written test. As long as you have no ideas, you can use the backtracking algorithm to solve the problem violently. Even if you cannot pass all the test cases, you can pass a little.

The technique of backtracking algorithm is not difficult. As mentioned in the previous article Backtracking Algorithm Framework Routine , backtracking algorithm is the process of exhausting a decision tree, as long as you "make a choice" before recursion, and "undo the choice" after recursion.

However, even if it is violent and exhaustive, different ideas have advantages and disadvantages. .

This article will look at a very classic backtracking algorithm problem, focusing on the 698th question "divide into k equal subsets". This question can help you understand the thinking of backtracking algorithms more deeply, and write backtracking functions with ease.

The subject is very simple:

Give you an array nums and a positive integer k , please judge whether nums can be equally divided into elements and the same k subsets.

The function signature is as follows:

boolean canPartitionKSubsets(int[] nums, int k);

We previously wrote a subset division problem in , the subset division of the knapsack problem, , but that problem only requires us to divide the set into two equal sets, which can be transformed into a knapsack problem and solved by dynamic programming techniques.

However, if it is divided into multiple equal sets, the solution is generally only through violent exhaustion, and the time complexity is explosive, which is a good opportunity to practice backtracking algorithm and recursive thinking.

1. Analysis of ideas

First, let's review what we've learned about permutations and combinations:

1. P(n, k) (there are also many written as A(n, k) ) means the permutation (Permutation/Arrangement) of k C(n, k) k n different elements n

2. The main difference between "arrangement" and "combination" is whether to consider the difference in order.

3. The formula for calculating the total number of permutations and combinations:

Ok, now I ask a question, how is this permutation formula P(n, k) derived? To figure this out, I need to teach a little bit about combinatorics.

Various variants of the permutation and combination problem can be abstracted into a "ball box model", and P(n, k) can be abstracted into the following scenario:

That is, put n marked with different serial numbers (the numbers are to reflect the difference in order) and put them into k boxes marked with different serial numbers (among which n >= k , each box contains exactly one ball in the end), a total of P(n, k) Different ways.

Now you come, put the ball in the box, how do you put it? There are actually two perspectives.

all, you can stand in the box's perspective , and each box must choose a ball.

This way, the first box can choose any of n balls, and then you need to make the remaining k - 1 boxes choose among n - 1 balls:

addition, you can also stand in the ball's perspective , because not every ball will be put into the box, so the ball's perspective is divided into two situations:

1. The first ball can not be put into any box, so you need to put the remaining n - 1 balls into k boxes.

2. The first ball can be put into any of the k boxes, so you need to put the remaining n - 1 balls into k - 1 box.

Combining the above two cases, we can get:

You see, the two perspectives get two different recursions, but the result of the two recursive solutions is the well-known factorial form:

As for how to solve the recursive formula, there is a lot of mathematics involved, so I will not discuss it in depth here. Interested readers can learn the knowledge of combinatorial mathematics by themselves.

Back to the main topic, this algorithm problem allows us to find subset division. The subset problem is different from the permutation and combination problem, but we can learn from the abstraction of the "ball-box model" and use two different perspectives to solve this subset division. question.

Divide the array nums containing n numbers into k and the same set, you can imagine assigning n numbers into k "buckets", and finally the sum of the numbers in these k "buckets" should be the same.

As mentioned earlier in Backtracking Algorithm Framework Routine , what is the key to the backtracking algorithm?

The key is to know how to "make choices" so that you can exhaustively use recursive functions.

Then imitating the derivation idea of the permutation formula and assigning n numbers to k buckets, we can also have two perspectives:

perspective 1, if we switch to the perspective of these n numbers, each number must choose to enter a certain k .

perspective 2, if we switch to the perspective of these k buckets, for each bucket, we have to traverse nums in n , and then choose whether to put the currently traversed numbers into our bucket .

What is the difference between these two perspectives, you may ask?

uses different perspectives to exhaustively. Although the results are the same, the logic of the solution code is completely different, and the efficiency of the algorithm is also different. Comparing different exhaustive perspectives can help you understand the backtracking algorithm more deeply. Come to .

2. From a digital perspective

Iterating through nums array with a for loop is sure to work:

for (int index = 0; index < nums.length; index++) {
    System.out.println(nums[index]);
}

Can you recursively traverse an array? It's actually quite simple:

void traverse(int[] nums, int index) {
    if (index == nums.length) {
        return;
    }
    System.out.println(nums[index]);
    traverse(nums, index + 1);
}

As long as traverse(nums, 0) is called, the effect of the for loop is exactly the same.

So back to this question, from the perspective of numbers, select k buckets, and write them out with a for loop as follows:

// k 个桶（集合），记录每个桶装的数字之和
int[] bucket = new int[k];

// 穷举 nums 中的每个数字
for (int index = 0; index < nums.length; index++) {
    // 穷举每个桶
    for (int i = 0; i < k; i++) {
        // nums[index] 选择是否要进入第 i 个桶
        // ...
    }
}

If it is changed to recursive form, it is the following code logic:

// k 个桶（集合），记录每个桶装的数字之和
int[] bucket = new int[k];

// 穷举 nums 中的每个数字
void backtrack(int[] nums, int index) {
    // base case
    if (index == nums.length) {
        return;
    }
    // 穷举每个桶
    for (int i = 0; i < bucket.length; i++) {
        // 选择装进第 i 个桶
        bucket[i] += nums[index];
        // 递归穷举下一个数字的选择
        backtrack(nums, index + 1);
        // 撤销选择
        bucket[i] -= nums[index];
    }
}

Although the above code is only exhaustive logic and cannot solve our problem, it only needs to be improved slightly:

// 主函数
boolean canPartitionKSubsets(int[] nums, int k) {
    // 排除一些基本情况
    if (k > nums.length) return false;
    int sum = 0;
    for (int v : nums) sum += v;
    if (sum % k != 0) return false;

    // k 个桶（集合），记录每个桶装的数字之和
    int[] bucket = new int[k];
    // 理论上每个桶（集合）中数字的和
    int target = sum / k;
    // 穷举，看看 nums 是否能划分成 k 个和为 target 的子集
    return backtrack(nums, 0, bucket, target);
}

// 递归穷举 nums 中的每个数字
boolean backtrack(
    int[] nums, int index, int[] bucket, int target) {

    if (index == nums.length) {
        // 检查所有桶的数字之和是否都是 target
        for (int i = 0; i < bucket.length; i++) {
            if (bucket[i] != target) {
                return false;
            }
        }
        // nums 成功平分成 k 个子集
        return true;
    }
    
    // 穷举 nums[index] 可能装入的桶
    for (int i = 0; i < bucket.length; i++) {
        // 剪枝，桶装装满了
        if (bucket[i] + nums[index] > target) {
            continue;
        }
        // 将 nums[index] 装入 bucket[i]
        bucket[i] += nums[index];
        // 递归穷举下一个数字的选择
        if (backtrack(nums, index + 1, bucket, target)) {
            return true;
        }
        // 撤销选择
        bucket[i] -= nums[index];
    }

    // nums[index] 装入哪个桶都不行
    return false;
}

With the previous foreshadowing, I believe this code is easier to understand. Although this solution can be passed, it takes a lot of time. In fact, we can do another optimization.

Mainly look at the recursive part of the backtrack function:

for (int i = 0; i < bucket.length; i++) {
    // 剪枝
    if (bucket[i] + nums[index] > target) {
        continue;
    }

    if (backtrack(nums, index + 1, bucket, target)) {
        return true;
    }
}

If we let as many cases as possible hit the pruned if branch, we can reduce the number of recursive calls and reduce the time complexity to a certain .

How to hit this if branch as much as possible? To know that our index parameter is incremented from 0, that is, recursively traverse nums array from 0.

If we sort the nums array in advance, and put the larger numbers first, then the larger numbers will be allocated to bucket first, and for the later numbers, bucket[i] + nums[index] will be larger, which is easier to trigger the if condition of pruning.

So you can add some more code to the previous code:

boolean canPartitionKSubsets(int[] nums, int k) {
    // 其他代码不变
    // ...
    /* 降序排序 nums 数组 */
    Arrays.sort(nums);
    for (i = 0, j = nums.length - 1; i < j; i++, j--) {
        // 交换 nums[i] 和 nums[j]
        int temp = nums[i];
        nums[i] = nums[j];
        nums[j] = temp;
    }
    /*******************/
    return backtrack(nums, 0, bucket, target);
}

Due to the language characteristics of Java, this code achieves the purpose of descending order by first sorting in ascending order and then reversing.

3. From the perspective of the barrel

At the beginning of the article, is exhaustive from the perspective of buckets. Each bucket needs to traverse all the numbers in nums to decide whether to put the current number into the bucket; when one bucket is full, the next bucket must be installed until All buckets are full until .

This idea can be represented by the following code:

// 装满所有桶为止
while (k > 0) {
    // 记录当前桶中的数字之和
    int bucket = 0;
    for (int i = 0; i < nums.length; i++) {
        // 决定是否将 nums[i] 放入当前桶中
        bucket += nums[i] or 0;
        if (bucket == target) {
            // 装满了一个桶，装下一个桶
            k--;
            break;
        }
    }
}

Then we can also rewrite this while loop into a recursive function, but it is slightly more complicated than just now. First, write a backtrack recursive function:

boolean backtrack(int k, int bucket, 
    int[] nums, int start, boolean[] used, int target);

Don't be intimidated by so many parameters, I will explain them one by one. If you can thoroughly understand this article, you can also write such a backtracking function .

The parameters of this backtrack function can be interpreted like this:

Now the bucket k is thinking about whether to put the element nums[start] in it; the k of the numbers already loaded in bucket 06224be67d2900 is bucket ; used whether an element has been loaded into the bucket; target is what each bucket needs to achieve target and.

According to this function definition, the backtrack function can be called like this:

boolean canPartitionKSubsets(int[] nums, int k) {
    // 排除一些基本情况
    if (k > nums.length) return false;
    int sum = 0;
    for (int v : nums) sum += v;
    if (sum % k != 0) return false;
    
    boolean[] used = new boolean[nums.length];
    int target = sum / k;
    // k 号桶初始什么都没装，从 nums[0] 开始做选择
    return backtrack(k, 0, nums, 0, used, target);
}

Before implementing the logic of the backtrack function, let's repeat, from the bucket's perspective:

1. It is necessary to traverse all the numbers in nums and decide which numbers need to be loaded into the current bucket.

2. If the current bucket is full (the sum of the numbers in the bucket reaches target ), let the next bucket start to execute step 1.

The following code implements this logic:

boolean backtrack(int k, int bucket, 
    int[] nums, int start, boolean[] used, int target) {
    // base case
    if (k == 0) {
        // 所有桶都被装满了，而且 nums 一定全部用完了
        // 因为 target == sum / k
        return true;
    }
    if (bucket == target) {
        // 装满了当前桶，递归穷举下一个桶的选择
        // 让下一个桶从 nums[0] 开始选数字
        return backtrack(k - 1, 0 ,nums, 0, used, target);
    }

    // 从 start 开始向后探查有效的 nums[i] 装入当前桶
    for (int i = start; i < nums.length; i++) {
        // 剪枝
        if (used[i]) {
            // nums[i] 已经被装入别的桶中
            continue;
        }
        if (nums[i] + bucket > target) {
            // 当前桶装不下 nums[i]
            continue;
        }
        // 做选择，将 nums[i] 装入当前桶中
        used[i] = true;
        bucket += nums[i];
        // 递归穷举下一个数字是否装入当前桶
        if (backtrack(k, bucket, nums, i + 1, used, target)) {
            return true;
        }
        // 撤销选择
        used[i] = false;
        bucket -= nums[i];
    }
    // 穷举了所有数字，都无法装满当前桶
    return false;
}

The code can get the correct answer, but the efficiency is very low. We can think about whether there is still room for optimization .

First, each bucket can be considered indistinguishable in this solution, but our backtracking algorithm treats them differently, and double-counting occurs here.

What does that mean? Our backtracking algorithm, in the final analysis, is to exhaust all possible combinations and see if we can find k buckets (subsets) that sum to target .

Then, for example, in the following case, target = 5 , the algorithm will put 1, 4 in the first bucket:

Now that the first bucket is full, start filling the second bucket and the algorithm will load 2, 3 :

Then by analogy, the following elements are exhausted, and several buckets (subsets) whose sum is 5 are made up.

But the question is, what will the algorithm do if it turns out that it cannot make up the subset of k whose sum is target ?

The backtracking algorithm will backtrack to the first bucket and start all over again. Now that it knows that it is not feasible to put 1, 4 in the first bucket, it will try to put 2, 3 in the first bucket:

Now that the first bucket is full, start filling the second bucket and the algorithm will load 1, 4 :

Well, here you should see the problem, this situation is actually the same as the previous one. That is to say, you already know that you don't need to be exhaustive at this point, and you must not be able to find a subset of 06224be67d2a81 that k to target .

But our algorithm will continue to be stupid, because in her opinion, the elements in the first bucket and the second bucket are different, so these are two different situations.

So how do we get the IQ of the algorithm up to recognize this situation and avoid redundant computations?

You notice that the used array must look the same in both cases, so the used array can be considered the "state" in the backtracking process.

Therefore, we can use a memo memo to record the current state of used when a bucket is full. If the current state of used has appeared before, then there is no need to continue the exhaustive list, so as to prune and avoid redundant calculations The role of .

Some readers will definitely ask, used is a boolean array, how to store it as a key? This is actually a small problem. For example, we can convert the array into a string, so that it can be stored as the key of the hash table.

Look at the code implementation, just change the backtrack function a little bit:

// 备忘录，存储 used 数组的状态
HashMap<String, Boolean> memo = new HashMap<>();

boolean backtrack(int k, int bucket, int[] nums, int start, boolean[] used, int target) {        
    // base case
    if (k == 0) {
        return true;
    }
    // 将 used 的状态转化成形如 [true, false, ...] 的字符串
    // 便于存入 HashMap
    String state = Arrays.toString(used);

    if (bucket == target) {
        // 装满了当前桶，递归穷举下一个桶的选择
        boolean res = backtrack(k - 1, 0, nums, 0, used, target);
        // 将当前状态和结果存入备忘录
        memo.put(state, res);
        return res;
    }
    
    if (memo.containsKey(state)) {
        // 如果当前状态曾今计算过，就直接返回，不要再递归穷举了
        return memo.get(state);
    }

    // 其他逻辑不变...
}

After submitting the solution in this way, it is found that the execution efficiency is still relatively low. This time, it is not because of the redundant calculation of the algorithm logic, but the problem of code implementation.

Because each recursion has to convert the used array into a string, which is also a lot of consumption for programming languages, so we can further optimize .

Note that the data size given by the title is nums.length <= 16 , which means that the used array will not exceed 16 at most, then we can use the "bitmap" technique to replace the used array with an int type used variable.

Specifically, we can represent true/false of used[i] by 1/0 of the i bit ( (used >> i) & 1 ) of the integer used .

In this way, not only space is saved, but the integer used can also be directly stored in the HashMap as a key, saving the consumption of converting arrays to strings.

Take a look at the final solution code:

public boolean canPartitionKSubsets(int[] nums, int k) {
    // 排除一些基本情况
    if (k > nums.length) return false;
    int sum = 0;
    for (int v : nums) sum += v;
    if (sum % k != 0) return false;
    
    int used = 0; // 使用位图技巧
    int target = sum / k;
    // k 号桶初始什么都没装，从 nums[0] 开始做选择
    return backtrack(k, 0, nums, 0, used, target);
}

HashMap<Integer, Boolean> memo = new HashMap<>();

boolean backtrack(int k, int bucket,
                  int[] nums, int start, int used, int target) {        
    // base case
    if (k == 0) {
        // 所有桶都被装满了，而且 nums 一定全部用完了
        return true;
    }
    if (bucket == target) {
        // 装满了当前桶，递归穷举下一个桶的选择
        // 让下一个桶从 nums[0] 开始选数字
        boolean res = backtrack(k - 1, 0, nums, 0, used, target);
        // 缓存结果
        memo.put(used, res);
        return res;
    }
    
    if (memo.containsKey(used)) {
        // 避免冗余计算
        return memo.get(used);
    }

    for (int i = start; i < nums.length; i++) {
        // 剪枝
        if (((used >> i) & 1) == 1) { // 判断第 i 位是否是 1
            // nums[i] 已经被装入别的桶中
            continue;
        }
        if (nums[i] + bucket > target) {
            continue;
        }
        // 做选择
        used |= 1 << i; // 将第 i 位置为 1
        bucket += nums[i];
        // 递归穷举下一个数字是否装入当前桶
        if (backtrack(k, bucket, nums, i + 1, used, target)) {
            return true;
        }
        // 撤销选择
        used ^= 1 << i; // 使用异或运算将第 i 位恢复 0
        bucket -= nums[i];
    }

    return false;
}

At this point, the second idea of this question has also been completed.

Fourth, the final summary

Both of these two ideas written in this article can calculate the correct answer, but even if the first solution is sorted and optimized, it is obviously much slower than the second solution. Why?

Let's analyze the time complexity of these two algorithms, assuming that the number of elements in nums is n .

Let’s talk about the first solution, that is, from the perspective of numbers, there are n numbers, and each number has k buckets to choose from, so the number of combined results is k^n , and the time complexity is O(k^n) .

The second solution, each bucket needs to traverse n numbers, and there are two options of "loading" or "not loading" for each number, so the combined results are 2^n ; and we have k buckets, so the total The time complexity is O(k*2^n) .

Of course, this is a rough estimate of the upper bound of the worst complexity, the actual complexity must be much better, after all, we have added so much pruning logic . However, it can be seen from the upper bound of complexity that the first idea is much slower.

So, who says backtracking algorithms aren't tricky? Although the backtracking algorithm is violent exhaustion, exhaustion is also divided into smart exhaustion and inefficient exhaustion. The key depends on whose "perspective" you are performing exhaustion.

In layman's terms, we should try our best to "choose a few times", that is to say, it is better to make more choices than to give too much room for choice; it is better to choose "one of two" k times than "choose one of k " once .

Well, this question is exhaustive from two perspectives. Although the amount of code seems to be large, the core logic is similar. I believe you can understand the backtracking algorithm more deeply through this article.

＿＿＿＿＿＿＿＿＿＿＿＿＿

Click on my avatar see more high-quality algorithm articles, and take you by the hand to make the algorithm clear! My algorithm tutorial has won 100k stars, welcome to like it!

Classic Backtracking Algorithm: Set Partition Problem

1. Analysis of ideas

2. From a digital perspective

3. From the perspective of the barrel

Fourth, the final summary

labuladong

引用和评论

王炸！算法可视化功能全面上线，包括递归算法可视化！

大模型时代，后端程序员如何避免被AI卷死？

C++ 中 VS 项目引入公共配置文件

LSM-TREE从入门到入魔：从零开始实现一个高性能键值存储｜得物技术

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

OpenWebUI：一站式 AI 应用构建平台体验

Classic Backtracking Algorithm: Set Partition Problem

1. Analysis of ideas

2. From a digital perspective

3. From the perspective of the barrel

Fourth, the final summary

labuladong

引用和评论

王炸！算法可视化功能全面上线，包括递归算法可视化！

大模型时代，后端程序员如何避免被AI卷死？

C++ 中 VS 项目引入公共配置文件

LSM-TREE从入门到入魔：从零开始实现一个高性能键值存储 ｜ 得物技术

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

OpenWebUI：一站式 AI 应用构建平台体验

LSM-TREE从入门到入魔：从零开始实现一个高性能键值存储｜得物技术