2014-11-18
Hadoop版本是2.4.1,搭建于Linux Mint 16 之上。
在我的Desktop上,有文件t1.txt
,内容如下:
Sign up for GitHub. By clicking "Sign up for GitHub", you agree to our terms of service and privacy policy. We will send you account related emails occasionally
在HDFS中的/input/t1.txt
内容与上面的相同。
创建项目并引入包
在eclipse中创建项目LearnHDFS
,引入hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar
、hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-2.4.1.jar
以及hadoop-2.4.1/share/hadoop/common/lib/
目录下的所有jar包。
配置log4j
在LearnHDFS
项目中添加文件log4j.properties
,内容如下:
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
读取本地文件
在LearnHDFS
项目中添加文件ReadLocalFile.java
,内容如下:
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import java.io.*;
public class ReadLocalFile {
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path file = new Path("file:///home/sunlt/Desktop/t1.txt");
FSDataInputStream getIt = fs.open(file);
BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
String s = "";
while ((s = d.readLine()) != null) {
System.out.println(s);
}
d.close();
fs.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
直接用eclipse运行ReadLocalFile.java
,输出结果是:
Sign up for GitHub. By clicking "Sign up for GitHub", you agree to our terms of service and privacy policy. We will send you account related emails occasionally
读取HDFS中的文件
在LearnHDFS
项目中添加文件ReadHDFSFile.java
,内容如下:
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.net.URI;
public class ReadHDFSFile {
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000"), conf);
Path file = new Path("/input/t1.txt");
FSDataInputStream getIt = fs.open(file);
BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
String s = "";
while ((s = d.readLine()) != null) {
System.out.println(s);
}
d.close();
fs.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
由于Hadop的core-site.xml中设置了:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
所以FileSystem.get()
方法的第一个参数设置为了new URI("hdfs://localhost:9000")
。 直接用eclipse运行ReadHDFSFile.java
,输出结果是:
Sign up for GitHub. By clicking "Sign up for GitHub", you agree to our terms of service and privacy policy. We will send you account related emails occasionally
遇到的问题
**1、**如果只引入hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar
和hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-2.4.1.jar
,在eclipse中编写代码时并不会提示错误,但是当直接在eclipse下运行程序时,会因为找不到某些类而报错,比如:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
**2、**如果没有设置log4j,在运行时会有警告:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
所以最好添加文件log4j.properties
,并做一些设置。
关于HDFS的文件操作
文件操作包括读写文件、创建目录、删除文件和目录、查看文件状态等操作,这些在Tom White的**《Hadoop权威指南 第2版》第3章 Hadoop分布式文件系统**中有较为详细的介绍。
下面两个博客中有总结好的代码: