Hadoop: cannot set default FileSystem as HDFS in core-site.xml

filesystemshadoophdfs

I am using Hadoop 1.0.3 in a Pseudo-Distributed mode. And my conf/core-site.xml is set as follows:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
    <name>mapred.child.tmp</name>
    <value>/home/administrator/hadoop/temp</value>
    </property>
</configuration>

So I believed that my default filesystem is set to HDFS.
However, when I run the following code:

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

I thought that fs should be a DistributedFileSystem instance. However, it turns out to be LocalFileSystem instance.

But, if I run the following code:

Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
FileSystem fs = FileSystem.get(conf);

Then I can get a DistributedFileSystem fs.

Isn't my default FileSystem set to HDFS in core-site.xml? If not, how should I set that?

Best Solution

Eclipse environment doesn't know where the conf directory under Hadoop install directory to find the core-default.xml and core-site.xml unless these files are added to the Eclipse classpath to load first.

Since these are not added in the eclipse classpath, the default core-site.xml will be loaded from the jar file hadoop-*-core.jar (For eg: hadoop-0.20.2-core.jar for version 0.20) which has the local system as default file system and hence you are seeing LocalFileSystem object instead of DistributedFileSystem.

So, to add the <HADOOP_INSTALL>/conf directory to eclipse project classpath, goto the project properties(project -> properties) -> Java build path -> Libraries tab -> Add external class folder -> Select the conf directory from <HADOOP_INSTALL>

The above should add your `/core-site.xml' to your eclipse classpath and all your settings should override the default ones.

Related Question