How to Work with Partitioned In-memory Tables

In DolphinDB, all partition schemes for databases also apply to in-memory tables except composite domain. Working with partitioned in-memory tables can utilize parallel computing capacities of multi-core CPUs.

1. Load Data as In-memory Partitioned Tables

DolphinDB offers various ways to load a data set as a partitioned in-memory table. We start with creating a text file for the examples below.

n=30000000
workDir = "C:/DolphinDB/Data"
if(!exists(workDir)) mkdir(workDir)
trades=table(rand(`IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S,n) as sym, 2000.01.01+rand(365,n) as date, 10.0+rand(2.0,n) as price1, 100.0+rand(20.0,n) as price2, 1000.0+rand(200.0,n) as price3, 10000.0+rand(2000.0,n) as price4, 10000.0+rand(3000.0,n) as price5, 10000.0+rand(4000.0,n) as price6, rand(10,n) as qty1, rand(100,n) as qty2, rand(1000,n) as qty3, rand(10000,n) as qty4, rand(10000,n) as qty5, rand(10000,n) as qty6)
trades.saveText(workDir + "/trades.txt")

1.1. Use Function ploadText to Generate an In-memory Table with Sequential Partition

This approach is the most simple way to create a partitioned in-memory table. However, it does not have certain benefits or flexibilities of other approaches. For examples, the input data file must be smaller than available memory; the partitioned in-memory table generated by this approach cannot use function sortBy! to perform meaningful within-partition sorting.

trades = ploadText(workDir + "/trades.txt");

1.2. Use Function loadTextEx to Generate an In-memory Table with Specified Partition Scheme

This approach is for the following use cases:

  • Frequently conduct within-partition sorting.
  • Frequently use the partitioning column as the "group by" or "context by" column.

The input data file must be smaller than available memory.

We need to use empty string ("") for parameter "tableName" of function loadTextEx and parameter "directory" of function database.

db = database("", VALUE, `IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S)
trades = db.loadTextEx("", `sym, workDir + "/trades.txt");

Use function sortBy! for within-partition sorting.

trades.sortBy!(`qty1);

trades.sortBy!(`date`qty1, false true);

trades.sortBy!(<qty1 * price1>, false);

When the partitioning column is the grouping column, the grouping calculation has the optimal performance. The following table compares the performance of the following query on 3 tables: an unpatitioned table generated by trades=loadText(workDir+"/trades.txt"), the partitioned table with sequential partition in 1.1, and the partitioned table with value partition on "sym" in this section.

timer(10) select std(qty1) from trades group by sym;

Here "timer(10)" means the total time consumed when the query is executed 10 times.

TableGenerated byTime Consumed
unpartitioned tableloadText3.69 second
partitioned table with sequential domainploadText2.51 second
partitioned table with value domainloadTextEx0.17 second

1.3. Use Function loadTable to Import Selected or All Partitions of a Table on Disk into Memory

This approach is for the following use cases:

  • To import a text file larger than available memory and only a subset of the data are needed each time.
  • The same data set is used repeatedly. It is much faster to load a database table than to import a text file.

Use function loadTextEx to create a table in a partitioned database on disk: (can also use function createPartitionedTable and append!)

db = database(workDir+"/tradeDB", RANGE, ["A","G","M","S","ZZZZ"])
db.loadTextEx(`trades, `sym, workDir + "/trades.txt");

To load only 2 partitions (["A","G") and ["M","S")) into memory:

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades, ["A", "M"], 1);

Please note that we need to set the optional parameter "memoryMode" of function loadTable to 1. Otherwise only the metadata of the table are loaded.

1.3.1. Use Function loadTableBySQL to Import Selected Columns and/or Rows of a Partitioned Table on Disk

This is the most flexible way to create a partitioned in-memory table. It needs to be used after function loadTable.

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades);

sample=loadTableBySQL(<select * from trades where date between 2000.03.01 : 2000.05.01>);

sample=loadTableBySQL(<select sym, date, price1, qty1 from trades where date between 2000.03.01 : 2000.05.01>);

dates = 2000.01.16 2000.02.14 2000.08.01
st = sql(<select sym, date, price1, qty1>, trades, expr(<date>, in, dates))
sample = loadTableBySQL(st);

colNames =`sym`date`qty2`price2
st= sql(sqlCol(colNames), trades)
sample = loadTableBySQL(st);

2. Data Manipulation with In-memory Partitioned Tables

2.1. Add New Columns

trades = ploadText(workDir + "/trades.txt");
  1. SQL update statement
    update trades set logPrice1= log(price1), newQty1= double(qty1);
  2. function update!
    trades.update!(`logPrice1`newQty1, <[log(price1), double(qty1)]>);
  3. assignment statement
    trades[`logPrice1`newQty1] = <[log(price1), double(qty1)]>;

2.2. Update Existing Columns

  1. SQL update statement
    update trades set qty1=qty1+10;
    
    update trades set qty1=qty1+10 where sym = `IBM;
  2. function update!
    trades.update!(`qty1, <qty1+10>);
    
    trades.update!(`qty1, <qty1+10>, <sym=`IBM>);
  3. assginment statement
    trades[`qty1] = <qty1+10>;
    
    trades[`qty1, <sym=`IBM>] = <qty1+10>;

2.3. Delete Rows

  1. SQL delete statement

    delete from trades where qty3<20;
  2. function erase!

    trades.erase!(< qty3<30 >);

2.4. Drop Columns

trades.drop!("qty1");

2.5. Rename Columns

trades.rename!("qty2", "qty2New");