如何找出数组中重复的元素

2014-09-08

这个问题可以变形成多个问题：
1、找出数组中多次出现过的元素。
2、找数组中只出现1次的元素。
3、找到数组中没有出现的元素。例如，整型数组中最小元素是a，最大是b，在a和b之间哪些整数没有在数组中出现过？

hash方法

用hash来统计元素的出现次数，这样就可以轻易得解决问题1和问题2。

# !/usr/bin/env ruby
# coding: UTF-8

arr = [1, 2, 9, 8, 3, 9, 20, 2, 9, 8]

hs = Hash.new
arr.each { |e|
	if hs.has_key?(e)
		hs[e] += 1
	else
		hs[e] = 1
	end
}

p hs

结果是：

{1=>1, 2=>2, 9=>3, 8=>2, 3=>1, 20=>1}

然后遍历Hash对象hs即可。

Hash类在动态语言中都得到了很好的实现，其本身的实现有多种方法。如果要用C语言来实现，代码量会偏大。

位向量

这种方法适合数组元素是整数的情况。基本想法是：如果待处理数组的最小元素是0，定义一个较大的bit类型的新的数组，第1个bit（索引是0）代表数字0，第二个bit代表数字1，依次类推。这些bit的值默认是0。如果待处理数组中某元素为c，则将bit数组的第c+1个元素置1，根据这种方式来处理待处理数组中的所有元素。最终，扫描bit数组，就可以知道那些数字出现过，哪些数字没出现过。这种方法可以解决问题3。

为拓展这种方法可解决问题的范围，可以将新数组定义为整数类型，这样就可以记录某个数字出现的次数。其实也是一种Hash。

代码如下：

# !/usr/bin/env ruby
# coding: UTF-8

arr = [1, 2, 9, 8, 3, 9, 20, 2, 9, 8] # 待处理数组，所有元素必须大于等于0，且为整数

new_arr = Array.new(arr.max+1, 0)

for e in arr
	new_arr[e] += 1
end

for e in arr
	print e, "\t", new_arr[e], "\n"
end

运行结果如下（注意输出结果有重复）：

这种方法可以解决问题1、2、3，也对数组arr进行了排序，问题是空间使用量可能较大（例如数组arr中加入新元素20000，那么new_arr的size将是20001）。

链表+插入排序

上面位向量的方法在遇到极端的情况时，可能会比较浪费内存，要避免这种情况可以使用链表。结合插入排序，实现如下：

# !/usr/bin/env ruby
# coding: UTF-8

arr = [1, 2, 9, 8, 3, 9, 20, 2, 9, 8] # 待处理数组，所有元素必须大于等于0，且为整数

class LinkedList
	
	class Node
		attr_accessor :data, :times,:next
		def initialize(data, next_node=nil)
			@data = data
			@times = 1
			@next = next_node
		end
	end
	
	def initialize
		@root = nil
	end

	def insert(num)
		if @root.nil?
			@root = Node.new(num)
			# puts "debug: @root is nil"
			return
		end
		current_node = @root
		while true

			if current_node.data == num
				current_node.times += 1
				break
			end

			if current_node.next.nil?
				new_node = Node.new(num)
				current_node.next = new_node
				break
			end

			if current_node.data < num && current_node.next.data > num
				new_node = Node.new(num)
				new_node.next = current_node.next
				current_node.next = new_node
				break
			end

			current_node = current_node.next
					
		end
	end

	def show
		current_node = @root
		while not current_node.nil?
			print current_node.data, "\t", current_node.times, "\n"
			current_node = current_node.next
		end
	end
end

linked_list = LinkedList.new
for e in arr
	linked_list.insert(e)
end
linked_list.show

运行结果如下：

排序法

这个方法算是比较巧妙的。先看一下一个数组的排序结果：

# !/usr/bin/env ruby
# coding: UTF-8

arr = [1, 2, 9, 8, 3, 9, 20, 2, 9, 8] 
p arr.sort

运行结果如下：

[1, 2, 2, 3, 8, 8, 9, 9, 9, 20]

对这个运行结果，如果要解决问题1（找出多次出现过的元素），可以这样做：

# !/usr/bin/env ruby
# coding: UTF-8

arr = [1, 2, 9, 8, 3, 9, 20, 2, 9, 8] 
arr = arr.sort
# now arr is [1, 2, 2, 3, 8, 8, 9, 9, 9, 20]

position = 0

while position < arr.size-1
	cur = arr[position]
	if arr[position] == arr[position+1]
		puts arr[position] # 这个数字多次出现
		while arr[position+1] == cur && position < arr.size-1 # 找下一个不同的数
			position += 1
		end
	else
		position += 1
	end
end

运行结果是：

2
8
9

要解决问题2（找数组中只出现1次的元素），找那些和左侧、右侧数字都不相等的数字即可。

要解决问题3（找到数组中没有出现的元素），找到相邻的两个不等的数字，输出这两个数字的中间的数字即可。