一、优化的点
- Reduce Task Number
- Map Task输出压缩
- Shuffle Phase 参数
- map、reduce分配的虚拟CPU
二、Reduce Task Number
Reduce Task 默认是一个;
Reduce Task的数目也不是越多越好,实际中需要测试调整,以调整到最优的个数, 如下;
job.setNumReduceTasks(2);
三、Map Task输出压缩
上一节已经讲到了;
四、Shuffle Phase 参数
具体可参考:mapred-default.xml
可调的有如下几点:
mapreduce.task.io.sort.factor:
mapreduce.task.io.sort.factor 10 The number of streams to merge at once while sorting files. This determines the number of open file handles.
mapreduce.task.io.sort.mb:
mapreduce.task.io.sort.mb 100 The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.
mapreduce.map.sort.spill.percent:
mapreduce.map.sort.spill.percent 0.80 The soft limit in the serialization buffer. Once reached, a thread will begin to spill the contents to disk in the background. Note that collection will not block if this threshold is exceeded while a spill is already in progress, so spills may be larger than this threshold when it is set to less than .5
五、map、reduce分配的虚拟CPU
默认都是一个虚拟CPU,实际中也可以调整;
1、map
mapreduce.map.cpu.vcores:
mapreduce.map.cpu.vcores 1 The number of virtual cores required for each map task.
2、reduce
mapreduce.reduce.cpu.vcores:
mapreduce.reduce.cpu.vcores 1 The number of virtual cores required for each reduce task.