HDFS Plugin

DolphinDB HDFS plugin can read files from Hadoop HDFS or load files from HDFS to DolphinDB.

The DolphinDB HDFS plugin has different branches, such as release200 and release130. Each branch corresponds to a DolphinDB server version. Please make sure you are in the correct branch of the plugin documentation.

Install Precompiled Plugin

Pre-compiled plugin files are stored in the DolphinDBPlugin/hdfs/bin/linux64. Download it to /DolphinDB/server/plugins/hdfs.

Specify the path to the dynamic libraries required for the plugin on Linux.

export LD_LIBRARY_PATH=/path/to/plugins/hdfs:$LD_LIBRARY_PATH

 //Find the folder where the libjvm.so library is located.
export LD_LIBRARY_PATH=/path/to/libjvm.so/:$LD_LIBRARY_PATH

Start DolphinDB server and load the plugin

cd DolphinDB/server //navigate into DolphinDB server directory
./dolphindb //start DolphinDB server
loadPlugin("/path/to/plugins/hdfs/PluginHdfs.txt");

Compile the Plugin

Environment Setup

# Download Hadoop
https://hadoop.apache.org
# for ubuntu users
sudo apt install cmake
# for Centos users
sudo yum install cmake

Compiling

cd hdfs
mkdir build
cd build
cmake .. -DHADOOP_DIR=/path/to/your/hadoop/home
make

Methods

connect

Syntax

conn=hdfs::connect(nameMode, port, [userName], [kerbTicketCachePath] )

Parameters

  • nameMode: the IP address where the HDFS is located. If it is local, you can also use "localhost".
  • port: the port number of HDFS. The local port is 9000.
  • userName: the user name for login.
  • kerbTicketCachePath: the path where Kerberos is located. It is an optional parameter.

Details

If the connection is built, return a handle. Otherwise, an exception will be thrown.

disconnect

Syntax

disconnect(hdfsFS)

Parameters

  • hdfsFS: the handle returned by the connect() function.

Details

Disconnect from the HDFS.

exists

Syntax

exists(hdfsFS, path )

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: an HDFS file path.

Details

Determine whether a specified path exists. If it does not exist, an error will be reported.

copy

Syntax

copy(hdfsFS1, src, hdfsFS2, dst)

Parameters

  • hdfsFS1: the handle returned by the connect() function.
  • src: the path to the source file.
  • hdfsFS2: the handle returned by the connect() function.
  • dst: the destination path.

Details

Copy the files from one HDFS to another. If failed, an error will be reported.

move

Syntax

move(hdfsFS1,src,hdfsFS2,dst)

Parameters

  • hdfsFS1: the handle returned by the connect() function.
  • src: the path to the source file.
  • hdfsFS2: the handle returned by the connect() function.
  • dst: the destination path.

Details

Move the files from one HDFS to another. If failed, an error will be reported.

delete

Syntax

delete(hdfsFS, path, recursive )

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: the path of the file to be deleted.
  • recursive: indicates whether to delete files or folders recursively.

Details

Delete a directory or file. If failed, an error will be reported.

rename

Syntax

rename(hdfsFS, oldPath, newPath )

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • oldPath: the path of the file to be renamed.
  • newPath: the path of the file after renaming. If a directory is specifed, the source file will be moved to it; If a file is specified, or the specified parent directory is missing, an error will be reported.

Details

Rename or move the files. If failed, an error will be reported.

createDirectory

Syntax

createDirectory(hdfsFS, path)

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: the path to the directory to be created.

Details

Create a new folder. If failed, an error will be reported.

chmod

Syntax

chmod(hdfsFS, path, mode)

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: the path to the file, of which access permissions you want to change.
  • mode: the digits that represent different permissions.

Details

Control access to a file or a folder. If failed, an error will be reported.

getListDirectory

Syntax

fileInfo=getListDirectory(hdfsFS, path)

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: the path to the target directory.

Details

Return a handle containing all information about the target directory. If failed, an error will be reported.

listDirectory

Syntax

listDirectory(fileInfo)

Parameters

  • fileInfo: the handle returned by the getListDirectory() function.

Details

List all file information in the target directory.

freeFileInfo

Syntax

freeFileInfo(fileInfo)

Parameters

  • fileInfo: the handle returned by the getListDirectory() function.

Details

Free up space occupied by directory information.

readFile

Syntax

readFile(hdfsFS, path, handler)

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: the path of the file to be loaded.
  • handler: the function for processing byte stream. It takes only 2 arguments.

Details

Read data from the HDFS server. Return an in-memory table that stores the data after it is processed with the handler function.

writeFile

Syntax

readFile(hdfsFS, path, tb, handler)

Parameters

  • hdfsFS: the handle returned by the connect() function.
  • path: the path of the file to be loaded.
  • tb: the in-memory table.
  • handler: the function for converting the in-memory table to the data stream. It takes only one argument.

Details

Store in-memory tables in HDFS with a specific format.

Appendix

loadPlugin("/path/to/PluginHdfs.txt");
fs=hdfs::connect("default",9000);
hdfs::exists(fs,"/user/name");
hdfs::exists(fs,"/user/name1");
hdfs::copy(fs,"/tmp/testfile.txt",fs,"/tmp/testfile.txt.bk");
hdfs::copy(fs,"/tmp/testfile1.txt",fs,"/tmp/testfile.txt.bk");
hdfs::move(fs,"/tmp/testfile.txt.bk",fs,"/user/name/input/testfile.txt");
hdfs::move(fs,"/user/name/input/testfile.txt",fs,"/user/name1/testfile.txt");
hdfs::rename(fs,"/user/name1/testfile.txt","/user/name1/testfile.txt.rename");
hdfs::createDirectory(fs,"/user/namme");
hdfs::chmod(fs,"/user/namme",600);
hdfs::delete(fs,"/user/namme",1);
hdfs::disconnect(fs);

fileInfo=hdfs::getListDirectory(fs,"/user/name/input/");
hdfs::listDirectory(fileInfo);
hdfs::freeFileInfo(fileInfo);

loadPlugin("/path/to/PluginOrc.txt")
re=hdfs::readFile(conn,'/tmp/testfile.orc',orc::loadORCHdfs)

loadPlugin("/path/to/PluginParquet.txt")
hdfs::writeFile(conn,'/tmp/testfile.parquet',re,parquet::saveParquetHdfs)