sample

Syntax

sample(partitionCol, size)

Arguments

partitionCol is a partitioning column.

size is a positive floating number or integer.

Details

Must be used in a where clause. Take a random sample of a number of partitions in a partitioned table.

Suppose the database has N partitions. If 0<size<1, then take int(N*size) partitions. If size is a positive integer, then take size partitions.

Examples

n=1000000
ID=rand(50, n)
x=rand(1.0, n)
t=table(ID, x)
db=database("dfs://rangedb1", RANGE, $ 0 10 20 30 40 50)
pt = db.createPartitionedTable(t, `pt, `ID)
pt.append!(t)
pt=loadTable(db,`pt);

Table pt has 5 partitions. To take a random sample of 2 partitions, we can use either of the following queries:

x = select * from pt where sample(ID, 0.4);

x = select * from pt where sample(ID, 2);