HDFS
DolphinDB HDFS plugin can read files from Hadoop HDFS or load files from HDFS to DolphinDB.
The DolphinDB HDFS plugin has different branches, such as release200 and release130. Each branch corresponds to a DolphinDB server version. Please make sure you are in the correct branch of the plugin documentation.
Install Precompiled Plugin
Pre-compiled plugin files are stored in the DolphinDBPlugin/hdfs/bin/linux64. Download it to /DolphinDB/server/plugins/hdfs.
Specify the path to the dynamic libraries required for the plugin on Linux.
export LD_LIBRARY_PATH=/path/to/plugins/hdfs:$LD_LIBRARY_PATH
//Find the folder where the libjvm.so library is located.
export LD_LIBRARY_PATH=/path/to/libjvm.so/:$LD_LIBRARY_PATH
Start DolphinDB server and load the plugin
cd DolphinDB/server //navigate into DolphinDB server directory
./dolphindb //start DolphinDB server
loadPlugin("/path/to/plugins/hdfs/PluginHdfs.txt");
Compile the Plugin
Environment Setup
# Download Hadoop
https://hadoop.apache.org
# for ubuntu users
sudo apt install cmake
# for Centos users
sudo yum install cmake
Compiling
cd hdfs
mkdir build
cd build
cmake .. -DHADOOP_DIR=/path/to/your/hadoop/home
make
Methods
connect
Syntax
conn=hdfs::connect(nameMode, port, [userName], [kerbTicketCachePath] )
Parameters
- nameMode: the IP address where the HDFS is located. If it is local, you can also use "localhost".
- port: the port number of HDFS. The local port is 9000.
- userName: the user name for login.
- kerbTicketCachePath: the path where Kerberos is located. It is an optional parameter.
Details
If the connection is built, return a handle. Otherwise, an exception will be thrown.
disconnect
Syntax
disconnect(hdfsFS)
Parameters
- hdfsFS: the handle returned by the connect() function.
Details
Disconnect from the HDFS.
exists
Syntax
exists(hdfsFS, path )
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: an HDFS file path.
Details
Determine whether a specified path exists. If it does not exist, an error will be reported.
copy
Syntax
copy(hdfsFS1, src, hdfsFS2, dst)
Parameters
- hdfsFS1: the handle returned by the connect() function.
- src: the path to the source file.
- hdfsFS2: the handle returned by the connect() function.
- dst: the destination path.
Details
Copy the files from one HDFS to another. If failed, an error will be reported.
move
Syntax
move(hdfsFS1,src,hdfsFS2,dst)
Parameters
- hdfsFS1: the handle returned by the connect() function.
- src: the path to the source file.
- hdfsFS2: the handle returned by the connect() function.
- dst: the destination path.
Details
Move the files from one HDFS to another. If failed, an error will be reported.
delete
Syntax
delete(hdfsFS, path, recursive )
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: the path of the file to be deleted.
- recursive: indicates whether to delete files or folders recursively.
Details
Delete a directory or file. If failed, an error will be reported.
rename
Syntax
rename(hdfsFS, oldPath, newPath )
Parameters
- hdfsFS: the handle returned by the connect() function.
- oldPath: the path of the file to be renamed.
- newPath: the path of the file after renaming. If a directory is specifed, the source file will be moved to it; If a file is specified, or the specified parent directory is missing, an error will be reported.
Details
Rename or move the files. If failed, an error will be reported.
createDirectory
Syntax
createDirectory(hdfsFS, path)
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: the path to the directory to be created.
Details
Create a new folder. If failed, an error will be reported.
chmod
Syntax
chmod(hdfsFS, path, mode)
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: the path to the file, of which access permissions you want to change.
- mode: the digits that represent different permissions.
Details
Control access to a file or a folder. If failed, an error will be reported.
getListDirectory
Syntax
fileInfo=getListDirectory(hdfsFS, path)
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: the path to the target directory.
Details
Return a handle containing all information about the target directory. If failed, an error will be reported.
listDirectory
Syntax
listDirectory(fileInfo)
Parameters
- fileInfo: the handle returned by the getListDirectory() function.
Details
List all file information in the target directory.
freeFileInfo
Syntax
freeFileInfo(fileInfo)
Parameters
- fileInfo: the handle returned by the getListDirectory() function.
Details
Free up space occupied by directory information.
readFile
Syntax
readFile(hdfsFS, path, handler)
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: the path of the file to be loaded.
- handler: the function for processing byte stream. It takes only 2 arguments.
Details
Read data from the HDFS server. Return an in-memory table that stores the data after it is processed with the handler function.
writeFile
Syntax
readFile(hdfsFS, path, tb, handler)
Parameters
- hdfsFS: the handle returned by the connect() function.
- path: the path of the file to be loaded.
- tb: the in-memory table.
- handler: the function for converting the in-memory table to the data stream. It takes only one argument.
Details
Store in-memory tables in HDFS with a specific format.
Appendix
loadPlugin("/path/to/PluginHdfs.txt");
fs=hdfs::connect("default",9000);
hdfs::exists(fs,"/user/name");
hdfs::exists(fs,"/user/name1");
hdfs::copy(fs,"/tmp/testfile.txt",fs,"/tmp/testfile.txt.bk");
hdfs::copy(fs,"/tmp/testfile1.txt",fs,"/tmp/testfile.txt.bk");
hdfs::move(fs,"/tmp/testfile.txt.bk",fs,"/user/name/input/testfile.txt");
hdfs::move(fs,"/user/name/input/testfile.txt",fs,"/user/name1/testfile.txt");
hdfs::rename(fs,"/user/name1/testfile.txt","/user/name1/testfile.txt.rename");
hdfs::createDirectory(fs,"/user/namme");
hdfs::chmod(fs,"/user/namme",600);
hdfs::delete(fs,"/user/namme",1);
hdfs::disconnect(fs);
fileInfo=hdfs::getListDirectory(fs,"/user/name/input/");
hdfs::listDirectory(fileInfo);
hdfs::freeFileInfo(fileInfo);
loadPlugin("/path/to/PluginOrc.txt")
re=hdfs::readFile(conn,'/tmp/testfile.orc',orc::loadORCHdfs)
loadPlugin("/path/to/PluginParquet.txt")
hdfs::writeFile(conn,'/tmp/testfile.parquet',re,parquet::saveParquetHdfs)