repetitive DNA sequence
Title description: All DNA consists of a series of nucleotides abbreviated 'A', 'C', 'G' and 'T', eg: "ACGAATTCCG". When studying DNA, it can sometimes be very helpful to identify repetitive sequences in DNA.
Write a function to find all target substrings of length 10 that occur more than once in the DNA string s.
For example descriptions, please refer to the official website of LeetCode.
Source: LeetCode
Link: https://leetcode-cn.com/problems/repeated-dna-sequences/
The copyright belongs to Lingkou Network. For commercial reprints, please contact the official authorization, and for non-commercial reprints, please indicate the source.
Solution 1: Hash
First of all, judge the special case. If the length of the string is less than 11, it means that it is not enough to form a target substring, and it is impossible to have a repeated sequence, and it returns empty directly.
Otherwise, initialize a map to record each non-repeating substring of length 10, the key is the substring, and the value indicates whether the corresponding key is a repeated sequence. Then traverse the string, use every 10 bits as a substring, and judge if the current substring does not exist, add it to the key; if it exists and has been marked as a duplicate, skip it, if it is not marked as a duplicate, mark it as a duplicate substring string.
Finally, returning the substring marked as repeated is the repeated sequence.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class LeetCode_187 {
/**
* 哈希
*
* @param s
* @return
*/
public static List<String> findRepeatedDnaSequences(String s) {
// 如果字符串的长度小于11,说明不够组成一个目标子串,不可能有重复的序列,直接返回空
if (s == null || s.length() < 11) {
return new ArrayList<>();
}
// 记录每一个不重复的长度为10的子串,key为子串,value表示相应的key是否是重复序列
Map<String, Boolean> map = new HashMap<>();
// 长度间隔为10,从第一个字符开始
int startIndex = 0, endIndex = startIndex + 10;
// 遍历到最后一个字符结束
while (endIndex <= s.length()) {
String substring = s.substring(startIndex, endIndex);
// 如果当前子串不存在,则添加到key中;如果存在且已标为重复,则跳过,如果没有标为重复,则标为重复子串
if (map.containsKey(substring)) {
if (!map.get(substring)) {
map.put(substring, true);
}
} else {
map.put(substring, false);
}
startIndex++;
endIndex++;
}
return map.entrySet().stream().filter(e -> e.getValue()).map(Map.Entry::getKey).collect(Collectors.toList());
}
public static void main(String[] args) {
// 测试用例,期望输出: ["AAAAACCCCC","CCCCCAAAAA"]
for (String str : findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT")) {
System.out.println(str);
}
}
}
[Daily Message] A single spark can start a prairie fire.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。