extractTextSchema

Syntax

extractTextSchema(filename, [delimiter], [skipRows=0])

Details

Generate the schema table for the input data file. The schema table has 2 columns: column names and their data types.

When the input file contains dates and times:

  • For data with delimiters (date delimiters "-", "/" and ".", and time delimiter ":"), it will be converted to the corresponding type. For example, "12:34:56" is converted to the SECOND type; "23.04.10" is converted to the DATE type.
  • For data without delimiters, data in the format of "yyMMdd" that meets 0<=yy<=99, 0<=MM<=12, 1<=dd<=31, will be preferentially parsed as DATE; data in the format of "yyyyMMdd" that meets 1900<=yyyy<=2100, 0<=MM<=12, 1<=dd<=31 will be preferentially parsed as DATE.
Note:

From version 1.30.22/2.00.10 onwards, function extractTextSchema supports a data file that contains a record with multiple newlines.

Parameters

filename is the input data file name with its absolute path. Currently only .csv files are supported.

delimiter (optional) is a string indicating the table column separator. It can consist of one or more characters, with the default being a comma (',').

skipRows (optional) is an integer between 0 and 1024 indicating the rows in the beginning of the text file to be ignored. The default value is 0.

Returns

A table.

Examples

n=1000000
timestamp=09:30:00+rand(18000,n)
ID=rand(100,n)
qty=100*(1+rand(100,n))
price=5.0+rand(100.0,n)
t1 = table(timestamp,ID,qty,price)
saveText(t1, "C:/DolphinDB/Data/t1.txt")
schema=extractTextSchema("C:/DolphinDB/Data/t1.txt");
schema;
name type
timestamp SECOND
ID INT
qty INT
price DOUBLE