1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
|
HDFS for Go
===========
[](https://godoc.org/github.com/colinmarc/hdfs) [](https://github.com/colinmarc/hdfs/actions/workflows/tests.yml)
This is a native golang client for hdfs. It connects directly to the namenode using
the protocol buffers API.
It tries to be idiomatic by aping the stdlib `os` package, where possible, and
implements the interfaces from it, including `os.FileInfo` and `os.PathError`.
Here's what it looks like in action:
```go
client, _ := hdfs.New("namenode:8020")
file, _ := client.Open("/mobydick.txt")
buf := make([]byte, 59)
file.ReadAt(buf, 48847)
fmt.Println(string(buf))
// => Abominable are the tumblers into which he pours his poison.
```
For complete documentation, check out the [Godoc][1].
The `hdfs` Binary
-----------------
Along with the library, this repo contains a commandline client for HDFS. Like
the library, its primary aim is to be idiomatic, by enabling your favorite unix
verbs:
$ hdfs --help
Usage: hdfs COMMAND
The flags available are a subset of the POSIX ones, but should behave similarly.
Valid commands:
ls [-lah] [FILE]...
rm [-rf] FILE...
mv [-fT] SOURCE... DEST
mkdir [-p] FILE...
touch [-amc] FILE...
chmod [-R] OCTAL-MODE FILE...
chown [-R] OWNER[:GROUP] FILE...
cat SOURCE...
head [-n LINES | -c BYTES] SOURCE...
tail [-n LINES | -c BYTES] SOURCE...
du [-sh] FILE...
checksum FILE...
get SOURCE [DEST]
getmerge SOURCE DEST
put SOURCE DEST
Since it doesn't have to wait for the JVM to start up, it's also a lot faster
`hadoop -fs`:
$ time hadoop fs -ls / > /dev/null
real 0m2.218s
user 0m2.500s
sys 0m0.376s
$ time hdfs ls / > /dev/null
real 0m0.015s
user 0m0.004s
sys 0m0.004s
Best of all, it comes with bash tab completion for paths!
Installing the commandline client
---------------------------------
Grab a tarball from the [releases page](https://github.com/colinmarc/hdfs/releases)
and unzip it wherever you like.
To configure the client, make sure one or both of these environment variables
point to your Hadoop configuration (`core-site.xml` and `hdfs-site.xml`). On
systems with Hadoop installed, they should already be set.
$ export HADOOP_HOME="/etc/hadoop"
$ export HADOOP_CONF_DIR="/etc/hadoop/conf"
To install tab completion globally on linux, copy or link the `bash_completion`
file which comes with the tarball into the right place:
$ ln -sT bash_completion /etc/bash_completion.d/gohdfs
By default on non-kerberized clusters, the HDFS user is set to the
currently-logged-in user. You can override this with another environment
variable:
$ export HADOOP_USER_NAME=username
Using the commandline client with Kerberos authentication
---------------------------------------------------------
Like `hadoop fs`, the commandline client expects a `ccache` file in the default
location: `/tmp/krb5cc_<uid>`. That means it should 'just work' to use `kinit`:
$ kinit bob@EXAMPLE.com
$ hdfs ls /
If that doesn't work, try setting the `KRB5CCNAME` environment variable to
wherever you have the `ccache` saved.
Compatibility
-------------
This library uses "Version 9" of the HDFS protocol, which means it should work
with hadoop distributions based on 2.2.x and above, as well as 3.x.
Acknowledgements
----------------
This library is heavily indebted to [snakebite][3].
[1]: https://godoc.org/github.com/colinmarc/hdfs
[2]: https://golang.org/doc/install
[3]: https://github.com/spotify/snakebite
|