Post on 11-Mar-2020
ORGANIZACIÓN
• Descripción Hadoop:
• Procesos involucrados
• Esquema de Funcionamiento
• Instalación de Hadoop
• Instalación Local
• Descripción de Instalación en Cluster
ORGANIZACIÓN• HDFS:
• Acceder al sistema de ficheros de Hadoop.
• Carga y descarga de Información
• Ejecución de Procesos
• Lanzamiento, ejecución y verificación de procesos (en local)
• Lanzamiento, ejecución y verificacion de procesos (cluster)
REPOSITORIO DE INFORMACÓN
• https://www.tsc.uc3m.es/~hmolina/mlgp
• Acceso restringido con password del Departamento
FLUJO DE DATOS DE HADOOP
Java MapReduce+DYLQJ�UXQ�WKURXJK�KRZ�WKH�0DS5HGXFH�SURJUDP�ZRUNV��WKH�QH[W�VWHS�LV�WR�H[SUHVV�LWLQ�FRGH��:H�QHHG�WKUHH�WKLQJV��D�PDS�IXQFWLRQ��D�UHGXFH�IXQFWLRQ��DQG�VRPH�FRGH�WRUXQ�WKH�MRE��7KH�PDS�IXQFWLRQ�LV�UHSUHVHQWHG�E\�WKH�Mapper�FODVV��ZKLFK�GHFODUHV�DQDEVWUDFW�map()�PHWKRG��([DPSOH�����VKRZV�WKH�LPSOHPHQWDWLRQ�RI�RXU�PDS�PHWKRG�
([DPSOH������0DSSHU�IRU�PD[LPXP�WHPSHUDWXUH�H[DPSOH
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } }}
7KH�Mapper�FODVV�LV�D�JHQHULF�W\SH��ZLWK�IRXU�IRUPDO�W\SH�SDUDPHWHUV�WKDW�VSHFLI\�WKHLQSXW�NH\��LQSXW�YDOXH��RXWSXW�NH\��DQG�RXWSXW�YDOXH�W\SHV�RI�WKH�PDS�IXQFWLRQ��)RU�WKHSUHVHQW�H[DPSOH��WKH�LQSXW�NH\�LV�D�ORQJ�LQWHJHU�RIIVHW��WKH�LQSXW�YDOXH�LV�D�OLQH�RI�WH[W�
)LJXUH������0DS5HGXFH�ORJLFDO�GDWD�IORZ
22 | Chapter 2:ಗMapReduce
www.it-ebooks.info
ARQUITECTURA TÍPICA
)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�D�VLQJOH�UHGXFH�WDVN
7KH�QXPEHU�RI�UHGXFH�WDVNV�LV�QRW�JRYHUQHG�E\�WKH�VL]H�RI�WKH�LQSXW��EXW�LV�VSHFLILHGLQGHSHQGHQWO\��,Q�£7KH�'HIDXOW�0DS5HGXFH�-RE¤�RQ�SDJH������\RX�ZLOO�VHH�KRZ�WRFKRRVH�WKH�QXPEHU�RI�UHGXFH�WDVNV�IRU�D�JLYHQ�MRE�
:KHQ�WKHUH�DUH�PXOWLSOH�UHGXFHUV��WKH�PDS�WDVNV�SDUWLWLRQ�WKHLU�RXWSXW��HDFK�FUHDWLQJRQH�SDUWLWLRQ�IRU�HDFK�UHGXFH�WDVN��7KHUH�FDQ�EH�PDQ\�NH\V��DQG�WKHLU�DVVRFLDWHG�YDOXHV�LQ�HDFK�SDUWLWLRQ��EXW�WKH�UHFRUGV�IRU�DQ\�JLYHQ�NH\�DUH�DOO�LQ�D�VLQJOH�SDUWLWLRQ��7KHSDUWLWLRQLQJ�FDQ�EH�FRQWUROOHG�E\�D�XVHU�GHILQHG�SDUWLWLRQLQJ�IXQFWLRQ��EXW�QRUPDOO\�WKHGHIDXOW�SDUWLWLRQHU¢ZKLFK�EXFNHWV�NH\V�XVLQJ�D�KDVK�IXQFWLRQ¢ZRUNV�YHU\�ZHOO�
7KH�GDWD�IORZ�IRU�WKH�JHQHUDO�FDVH�RI�PXOWLSOH�UHGXFH�WDVNV�LV�LOOXVWUDWHG�LQ�)LJXUH�����7KLV�GLDJUDP�PDNHV�LW�FOHDU�ZK\�WKH�GDWD�IORZ�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV�LV�FROOR�TXLDOO\�NQRZQ�DV�£WKH�VKXIIOH�¤�DV�HDFK�UHGXFH�WDVN� LV� IHG�E\�PDQ\�PDS�WDVNV��7KHVKXIIOH�LV�PRUH�FRPSOLFDWHG�WKDQ�WKLV�GLDJUDP�VXJJHVWV��DQG�WXQLQJ�LW�FDQ�KDYH�D�ELJLPSDFW�RQ�MRE�H[HFXWLRQ�WLPH��DV�\RX�ZLOO�VHH�LQ�£6KXIIOH�DQG�6RUW¤�RQ�SDJH�����
Scaling Out | 33
www.it-ebooks.info
ARQUITECTURA MÚLTIPLES REDUCTORES
)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�PXOWLSOH�UHGXFH�WDVNV
)LQDOO\��LW¦V�DOVR�SRVVLEOH�WR�KDYH�]HUR�UHGXFH�WDVNV��7KLV�FDQ�EH�DSSURSULDWH�ZKHQ�\RXGRQ¦W�QHHG�WKH�VKXIIOH�VLQFH�WKH�SURFHVVLQJ�FDQ�EH�FDUULHG�RXW�HQWLUHO\�LQ�SDUDOOHO��D�IHZH[DPSOHV�DUH�GLVFXVVHG�LQ�£1/LQH,QSXW)RUPDW¤�RQ�SDJH�������,Q�WKLV�FDVH��WKH�RQO\RII�QRGH�GDWD�WUDQVIHU�LV�ZKHQ�WKH�PDS�WDVNV�ZULWH�WR�+')6��VHH�)LJXUH������
Combiner Functions0DQ\�0DS5HGXFH�MREV�DUH�OLPLWHG�E\�WKH�EDQGZLGWK�DYDLODEOH�RQ�WKH�FOXVWHU��VR�LW�SD\VWR�PLQLPL]H�WKH�GDWD�WUDQVIHUUHG�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV��+DGRRS�DOORZV�WKHXVHU�WR�VSHFLI\�D�FRPELQHU�IXQFWLRQ�WR�EH�UXQ�RQ�WKH�PDS�RXWSXW¢WKH�FRPELQHU�IXQF�WLRQ¦V�RXWSXW�IRUPV�WKH�LQSXW�WR�WKH�UHGXFH�IXQFWLRQ��6LQFH�WKH�FRPELQHU�IXQFWLRQ�LV�DQRSWLPL]DWLRQ��+DGRRS�GRHV�QRW�SURYLGH�D�JXDUDQWHH�RI�KRZ�PDQ\�WLPHV�LW�ZLOO�FDOO�LWIRU�D�SDUWLFXODU�PDS�RXWSXW�UHFRUG��LI�DW�DOO��,Q�RWKHU�ZRUGV��FDOOLQJ�WKH�FRPELQHU�IXQF�WLRQ�]HUR��RQH��RU�PDQ\�WLPHV�VKRXOG�SURGXFH�WKH�VDPH�RXWSXW�IURP�WKH�UHGXFHU�
34 | Chapter 2:ಗMapReduce
www.it-ebooks.info
ARQUITECTURA SIN REDUCTORES
)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�QR�UHGXFH�WDVNV
7KH�FRQWUDFW�IRU�WKH�FRPELQHU�IXQFWLRQ�FRQVWUDLQV�WKH�W\SH�RI�IXQFWLRQ�WKDW�PD\�EHXVHG��7KLV�LV�EHVW�LOOXVWUDWHG�ZLWK�DQ�H[DPSOH��6XSSRVH�WKDW�IRU�WKH�PD[LPXP�WHPSHU�DWXUH�H[DPSOH��UHDGLQJV�IRU�WKH�\HDU������ZHUH�SURFHVVHG�E\�WZR�PDSV��EHFDXVH�WKH\ZHUH�LQ�GLIIHUHQW�VSOLWV���,PDJLQH�WKH�ILUVW�PDS�SURGXFHG�WKH�RXWSXW�
(1950, 0)(1950, 20)(1950, 10)
$QG�WKH�VHFRQG�SURGXFHG�
(1950, 25)(1950, 15)
7KH�UHGXFH�IXQFWLRQ�ZRXOG�EH�FDOOHG�ZLWK�D�OLVW�RI�DOO�WKH�YDOXHV�
(1950, [0, 20, 10, 25, 15])
ZLWK�RXWSXW�
(1950, 25)
VLQFH����LV�WKH�PD[LPXP�YDOXH�LQ�WKH�OLVW��:H�FRXOG�XVH�D�FRPELQHU�IXQFWLRQ�WKDW��MXVWOLNH�WKH�UHGXFH�IXQFWLRQ��ILQGV�WKH�PD[LPXP�WHPSHUDWXUH�IRU�HDFK�PDS�RXWSXW��7KHUHGXFH�ZRXOG�WKHQ�EH�FDOOHG�ZLWK�
(1950, [20, 25])
DQG�WKH�UHGXFH�ZRXOG�SURGXFH�WKH�VDPH�RXWSXW�DV�EHIRUH��0RUH�VXFFLQFWO\��ZH�PD\H[SUHVV�WKH�IXQFWLRQ�FDOOV�RQ�WKH�WHPSHUDWXUH�YDOXHV�LQ�WKLV�FDVH�DV�IROORZV�
max(0, 20, 10, 25, 15) = max(max(0, 20, 10), max(25, 15)) = max(20, 25) = 25
Scaling Out | 35
www.it-ebooks.info
HADOOP
• Varias formas de ejecución:
• En modo Standalone: No se necesita configurar nada.
• En modo Servidor - nodo local: Un sistema basado en cliente servidor, pero que se ejecuta en modo local todo.
• En modo distribuido: Infraestructura completa con varios nodos de almacenamiento, ejecución, etc...
MODO STANDALONE
• Descomprimir la distribución de Hadoop
• Establecer variable JAVA_HOME
• Et Voilà!!!!
PRUEBA
• Descomprimir BBDD de Reuters
• Ejecutar el comando:hadoop jar hadoop-examples-1.1.2.jar
hadoop jar hadoop-examples-1.1.2.jar wordcount dir_reuters dir_output
• El directorio dir_output no debe existir
• Observar demora
CONFIGURACIÓN EN MODO SERVIDOR LOCAL
• Creamos un directorio llamado
• conf_single
• Copiamos los contenidos de conf a conf_single
CONFIGURACIÓN DEL SERVIDOR MAESTRO
CORE-SITE.XML• Define el servidor que contendrá el sistema de ficheros
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop/tmp</value> </property></configuration>
HDFS-SITE.XML• Define la configuración del comportamiento del sistema
distribuido de ficheros
En instalaciones standalone se configura que la información no esté replicada. En configuraciones en cluster, la información DEBE estar replicada
<configuration> <property> <name>dfs.name.dir</name> <value>/tmp/hadoop/name</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.data.dir</name> <value>/tmp/hadoop/data</value> </property></configuration>
CONFIGURACIÓN DEL JOBTRACKERMAPRED-SITE.XML
• Configura el coordinador de tareas
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.system.dir</name> <value>/hadoop/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/tmp/hadoop/tmp_mapred</value> </property></configuration>
OTROS FICHEROS• Hay que editar el fichero hadoop-env.sh
# The java implementation to use. Required.export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/Home"
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -server"
INICIALIZACION DEL DFS
• Ejecutar :
hadoop --config conf_single namenode -formathadoop-daemon.sh --config conf_single start namenodehadoop --config conf_single dfs -mkdir /userhadoop --config conf_single dfs -chmod 755 /userhadoop --config conf_single dfs -mkdir /tmphadoop --config conf_single dfs -chmod 777 /tmphadoop --config conf_single dfs -mkdir /mapredhadoop --config conf_single dfs -chmod 755 /mapred
INICIAR EL SISTEMA
• bin/start-all.sh --config conf_single
• Acceso al estado de los sistemas:
• NameNode (DFS) http://localhost:50070
• JobTracker: http://localhost:50030
ACCESO AL SISTEMA DE FICHEROS
• El comando bin/hadoop invoca la API básica de hadoop.
• La aplicación dfs de la API básica permite el acceso al sistema de ficheros.
• bin/hadoop --config conf_single dfs
DFS
• Basado en los comandos UNIX
• hadoop --config conf_single dfs -ls
• hadoop --config conf_single dfs -mkdir
• hadoop --config conf_single dfs -chown
• hadoop --config conf_single dfs -chmod
DFS• Para subir ficheros del ordenador local al DFS
• hadoop --config conf_single dfs -put src dst
• hadoop --config conf_single dfs -copyFromLocal src dst
• Para descargar ficheros
• hadoop --config conf_single dfs -get src dst
• hadoop --config conf_single dfs -copyToLocal
INVOCAR UNA APLICACION
• Si está en un fichero jar :
• hadoop jar FICHERO_JAR.jar ClaseMain [parametros]
PRIMERA PRUEBA:CONTAR PALABRAS
• Se utilizará el programas ejemplos proporcionados por hadoop:
• hadoop --config conf_cluster jar hadoop-examples-1.1.2.jar
• En particular wordcount
• hadoop --config conf_cluster jar hadoop-examples-1.1.2.jar wordcount
MAPPER -> REDUCER
• Los datos se presentan por tuplas:
• <llave><dato>
• y se deben presentar
• <llave><dato>
ARQUITECTURA TÍPICA
)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�D�VLQJOH�UHGXFH�WDVN
7KH�QXPEHU�RI�UHGXFH�WDVNV�LV�QRW�JRYHUQHG�E\�WKH�VL]H�RI�WKH�LQSXW��EXW�LV�VSHFLILHGLQGHSHQGHQWO\��,Q�£7KH�'HIDXOW�0DS5HGXFH�-RE¤�RQ�SDJH������\RX�ZLOO�VHH�KRZ�WRFKRRVH�WKH�QXPEHU�RI�UHGXFH�WDVNV�IRU�D�JLYHQ�MRE�
:KHQ�WKHUH�DUH�PXOWLSOH�UHGXFHUV��WKH�PDS�WDVNV�SDUWLWLRQ�WKHLU�RXWSXW��HDFK�FUHDWLQJRQH�SDUWLWLRQ�IRU�HDFK�UHGXFH�WDVN��7KHUH�FDQ�EH�PDQ\�NH\V��DQG�WKHLU�DVVRFLDWHG�YDOXHV�LQ�HDFK�SDUWLWLRQ��EXW�WKH�UHFRUGV�IRU�DQ\�JLYHQ�NH\�DUH�DOO�LQ�D�VLQJOH�SDUWLWLRQ��7KHSDUWLWLRQLQJ�FDQ�EH�FRQWUROOHG�E\�D�XVHU�GHILQHG�SDUWLWLRQLQJ�IXQFWLRQ��EXW�QRUPDOO\�WKHGHIDXOW�SDUWLWLRQHU¢ZKLFK�EXFNHWV�NH\V�XVLQJ�D�KDVK�IXQFWLRQ¢ZRUNV�YHU\�ZHOO�
7KH�GDWD�IORZ�IRU�WKH�JHQHUDO�FDVH�RI�PXOWLSOH�UHGXFH�WDVNV�LV�LOOXVWUDWHG�LQ�)LJXUH�����7KLV�GLDJUDP�PDNHV�LW�FOHDU�ZK\�WKH�GDWD�IORZ�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV�LV�FROOR�TXLDOO\�NQRZQ�DV�£WKH�VKXIIOH�¤�DV�HDFK�UHGXFH�WDVN� LV� IHG�E\�PDQ\�PDS�WDVNV��7KHVKXIIOH�LV�PRUH�FRPSOLFDWHG�WKDQ�WKLV�GLDJUDP�VXJJHVWV��DQG�WXQLQJ�LW�FDQ�KDYH�D�ELJLPSDFW�RQ�MRE�H[HFXWLRQ�WLPH��DV�\RX�ZLOO�VHH�LQ�£6KXIIOH�DQG�6RUW¤�RQ�SDJH�����
Scaling Out | 33
www.it-ebooks.info
CALCULO DE MAXIMA TEMPERATURA ANUAL
• Base de datos de sensores en E.E.U.U.
• Datos no ordenados tipo texto
• Estructura simple
• Datos de interes:
• Año: cols 15 a 18
• Temperatura: cols 104 a 106. Si vale 9999 no es lectura válida
• Col 137 debe valer 0 o 1
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 999; @Override public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(14,18); int airTemperature; String quality = line.substring(136,137); if (quality.matches("[01]") ) { airTemperature = Integer.parseInt(line.substring(103,106).trim()); if (airTemperature != MISSING ) context.write(new Text(year), new IntWritable(airTemperature)); } }
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{ int maxValue = Integer.MIN_VALUE; for (IntWritable value : values ) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); }
import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature {
/** * @param args */ public static void main(String[] args) throws Exception{ if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }
}
EJEMPLO MAX TEMPERATURE
• Ejemplo básico de MaxTemhadoop jar MaxTemp.jar MaxTemperature\ <dfs_src> \ <dfs_dst>
TÉCNICAS
• Streaming:
• Las aplicaciones para hacer MAP y REDUCE se escriben en un lenguaje que permita leer información desde STDIN, y escribir a STDOUT
• Se invoca una aplicación Hadoop que distribuye un proceso a los nodos de cómputo, el cual invoca la aplicación de MAP/REDUCE que se le ha indicado
• hadoop jar ../../hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar
hadoop jar \ contrib/streaming/hadoop-streaming-1.1.2.jar \ -input {DFSDIR}/input -output {DFSDIR}/output_py \ -mapper {GLOBAL_DIR}/map.py \ -reducer {GLOBAL_DIR}/reduce.py
STREAMING (CONT)
• La aplicación que hace MAP/REDUCE DEBE SER ACCESIBLE en el sistema de ficheros normal de cada nodo (no en el DFS de Hadoop)
• Solución: Copiar los ficheros a clusterdata
ARQUITECTURA COMBINER
)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�D�VLQJOH�UHGXFH�WDVN
7KH�QXPEHU�RI�UHGXFH�WDVNV�LV�QRW�JRYHUQHG�E\�WKH�VL]H�RI�WKH�LQSXW��EXW�LV�VSHFLILHGLQGHSHQGHQWO\��,Q�£7KH�'HIDXOW�0DS5HGXFH�-RE¤�RQ�SDJH������\RX�ZLOO�VHH�KRZ�WRFKRRVH�WKH�QXPEHU�RI�UHGXFH�WDVNV�IRU�D�JLYHQ�MRE�
:KHQ�WKHUH�DUH�PXOWLSOH�UHGXFHUV��WKH�PDS�WDVNV�SDUWLWLRQ�WKHLU�RXWSXW��HDFK�FUHDWLQJRQH�SDUWLWLRQ�IRU�HDFK�UHGXFH�WDVN��7KHUH�FDQ�EH�PDQ\�NH\V��DQG�WKHLU�DVVRFLDWHG�YDOXHV�LQ�HDFK�SDUWLWLRQ��EXW�WKH�UHFRUGV�IRU�DQ\�JLYHQ�NH\�DUH�DOO�LQ�D�VLQJOH�SDUWLWLRQ��7KHSDUWLWLRQLQJ�FDQ�EH�FRQWUROOHG�E\�D�XVHU�GHILQHG�SDUWLWLRQLQJ�IXQFWLRQ��EXW�QRUPDOO\�WKHGHIDXOW�SDUWLWLRQHU¢ZKLFK�EXFNHWV�NH\V�XVLQJ�D�KDVK�IXQFWLRQ¢ZRUNV�YHU\�ZHOO�
7KH�GDWD�IORZ�IRU�WKH�JHQHUDO�FDVH�RI�PXOWLSOH�UHGXFH�WDVNV�LV�LOOXVWUDWHG�LQ�)LJXUH�����7KLV�GLDJUDP�PDNHV�LW�FOHDU�ZK\�WKH�GDWD�IORZ�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV�LV�FROOR�TXLDOO\�NQRZQ�DV�£WKH�VKXIIOH�¤�DV�HDFK�UHGXFH�WDVN� LV� IHG�E\�PDQ\�PDS�WDVNV��7KHVKXIIOH�LV�PRUH�FRPSOLFDWHG�WKDQ�WKLV�GLDJUDP�VXJJHVWV��DQG�WXQLQJ�LW�FDQ�KDYH�D�ELJLPSDFW�RQ�MRE�H[HFXWLRQ�WLPH��DV�\RX�ZLOO�VHH�LQ�£6KXIIOH�DQG�6RUW¤�RQ�SDJH�����
Scaling Out | 33
www.it-ebooks.info
import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature {
/** * @param args */ public static void main(String[] args) throws Exception{ if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }
}
CON COMBINER
• hadoop jar ../../hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar
hadoop jar hadoop-streaming-1.1.2.jar \ -input {DFSDIR}/input -output {DFSDIR}/output_py \ -mapper {GLOBAL_DIR}/map.py \ -combiner {GLOBAL_DIR}/reduce.py -reducer {GLOBAL_DIR}/reduce.py