Series

As one of the most common data structures in DolphinDB pandas, Series is a one-dimensional labeled array capable of holding various data types, including numeric values, strings, and Boolean values. Each Series consists of two components:

  • Data: contains the actual values stored in Series.

  • Index: contains labels used to identify and access the actual values.

Constructor

Series(data=None, index=None, lazy=False)
  • data must be strongly typed and can be a Python list, DolphinDB vector, pandas Series, or None.

  • index must be strongly typed and can be a Python list, DolphinDB vector, pandas Series, index, or None. Multiple indices are currently not supported.

  • lazy (optional) is a Boolean value representing whether to create a lazy Series. The default value is True. A lazy Series is a view of the original object where calculations are not executed immediately; a non-lazy Series is a copy of the original object where calculations are executed immediately.

Note:

  • The current version of DolphinDB pandas does not support specifying data as the column of a partitioned table.

  • If data is null, it is automatically filled with None and the data type is DOUBLE.

  • If data is specified as a pandas Series, index cannot be specified.

  • If lazy = True, index cannot contain a list with all items being None, such as [None,None].

Pre-Use Note:

  • The dtype parameter of Series functions can only be of the data types supported by DolphinDB, such as ddb.DOUBLE and ddb.STRING.

  • Some Series functions only support part of the parameters and the rest of the parameters must be passed through keywords. For instance, the Series.groupby function in DolphinDB pandas only supports parameters by and dropna as follows: Series.groupby(by=["a", "a", "b", "b"], dropna=False).

Attributes

Axis

DolphinDB pandas currently supports the following axis attributes:

Function

Description

Compatibility Statement

Series.index The index (axis labels) of the Series. Multiple indices are not supported.
Series.values Return Series as a DolphinDB vector.

Conversion

DolphinDB pandas currently supports the following functions for conversion:

Function

Description

Compatibility Statement

Series.astype(dtype[]) Cast a pandas object to a specified dtype. Parameters copy and errors are currently not supported.
Series.copy() Copy the data and indices of the calling object. Parameter deep is currently not supported.
Series.bool() Return the Boolean scalar value presented in the Series.
Series.to_list() Return a list of the Series values.

Indexing / Iteration

DolphinDB pandas currently supports the following functions for indexing or iteration:

Function

Description

Compatibility Statement

Series.get(key) Get the item from the object for a given key. Series.get(None) or Series.get(ddb.NULL) returns None.
Series.at Access a single value by label. When using at[i]=val to change an item, if the data types of val and the Series differ, an attempt will be made to convert the Series type. If the conversion fails, an error will be raised. Accessing a lazy Series with the Series.at method returns a Series.
Series.iat Access a single value by integer position. Same as above.
Series.loc Access a group of rows and columns by label(s) or a Boolean array. Same as above.
Series.iloc Purely integer-location based indexing for selection by position. Same as above.
Series.keys() Return alias for index.
Series.pop(item) Drop items from the Series and return the dropped items.
Series.item() Return the scalar value presented in the Series.
Series.xs(key[, axis, level, drop_level]) Return a cross-section from the Series for the given key value.

Binary Operator Functions

DolphinDB pandas supports all binary operator functions of Python pandas.Series.

Note:

  • The parameter level is not supported.

  • The parameter other in all functions only supports scalar, DolphinDB vector, list, and Series.

  • If the parameter other in eq is a scalar, the parameter fill_value cannot be specified.

  • The parameter func in combine only supports DolphinDB's built-in functions and does not support user-defined functions.

Function Application / GroupBy / Window

The parameter func of the following functions can only be built-in functions of pandas and user-defined functions (lambda expressions included). The parameter **kwargs is currently not supported.

Function

Compatibility Statement

Series.apply(func[, args]) The parameter convert_dtype is not supported.
Series.agg(func) The parameter axis is not supported.
Series.aggregate(func) The parameter axis is not supported.
Series.transform(func) The parameter axis is not supported.
Series.groupby(by,dropna=True) Parameters axis, level, as_index, sort, group_keys, squeeze,and observed are not supported.
rolling Parameters center, win_type, on, axis, closed, step, and method are not supported.
ewm Parameters ignore, axis, times, and method are not supported.
map The parameter arg cannot be specified as a Series.

Computations / Descriptive Statistics

DolphinDB pandas supports the following functions related to computations / descriptive stats of Python pandas.Series: abs, all, any, autocorr, between, corr, count, cov, diff, kurt, kurtosis, mad, max, mean, median, min, mode, nlargest, nsmallest, prod, sem, skew, std, sum, var, unique, nunique, is_unique, is_monotonic_increasing, is_monotonic_decreasing, value_counts.

Note:

  • Parameters axis, skipna, bool_only, fill_value, min_count, and ddof are not supported by all functions.

  • Except for functions count, mad, max, mean, median, min, prod, skew, sum, sem, std, var, the parameter level is not supported by other functions.

  • Some parameters are currently not supported by certain functions, as shown in the following table.

Function

Compatibility Statement

corr Parameters method and min_periods are not supported.
cov Parameters min_period and ddof are not supported.
diff The parameter periods is not supported.
kurt No parameter is supported.
kurtosis No parameter is supported.
std The parameter ddof is not supported.
value_counts The parameter bins is not supported.

Reindexing / Selection / Label Manipulation

DolphinDB pandas supports the following functions related to reindexing / selection / label manipulation of Python pandas.Series: drop, drop_duplicates, first, head, idxmax, idxmin, isin, last, reindex, reset_index.

Some parameters are currently not supported by certain functions, as shown in the following table.

Function

Compatibility Statement

drop Parameters axis, columns, inplace, and errors are not supported.
drop_duplicates Parameters inplace and ignore_index are not supported.
idxmax/idxmin The parameter axis is not supported.
reindex Parameters copy, level, and tolerance are not supported.
reset_index Parameters inplace and allow_duplicates are not supported.
head The parameter n cannot be 0.
take The parameter axis is not supported.

Missing Data Handling

DolphinDB pandas supports the following functions related to missing data handling of Python pandas.Series: backfill, isnull, notnull, pad.

Note: Parameters axis, downcast, inplace, and limit are not supported by all functions.

Reshaping / Sorting

DolphinDB pandas supports the following functions related to reshaping / sorting of Python pandas.Series: argsort, argmin, argmax, sort_values, sort_index.

Note: The parameter axis is not supported by all functions. Some parameters are currently not supported by certain functions, as shown in the following table.

Function

Compatibility Statement

argsort Parameters kind and na_position are not supported.
sort_values/sort_index Parameters inplace, kind, and na_position are not supported.

Combining / Comparing / Joining / Merging

DolphinDB pandas supports all functions related to combining / comparing / joining / merging of Python pandas.Series. However, Some parameters are currently not supported by certain functions, as shown in the following table.

Function

Compatibility Statement

compare The paramter align_axis can only be 1.
update

Time Series-Related

DolphinDB pandas supports most functions related to time series of Python pandas.Series, except for functions tz_convert and tz_localize. However, Some parameters are currently not supported by certain functions, as shown in the following table.

Function

Compatibility Statement

asfreq Parameters method, how, normalize, and fill_value are not supported.
asof The parameter subset is not supported.
shift The parameter axis is not supported. The parameter freq only takes the values of rule listed in asFreq.
resample Parameters on, loffset, base, and group_keys are not supported.
at_time Parameters asof and axis are not supported.
between_time Parameters include_start, include_end, and axis are not supported.

Creating a Series

  • If the parameter index is not specified, it will default to RangeIndex (0, 1, 2, …, n).

    s = pd.Series([20, 21, 12])
    print(s)

    Output:

    0 20

    1 21

    2 12

    dtype: INT

  • Specify the parameter index as a single column

    pd.Series([20, 21, 12], ['London', 'New York', 'Helsinki'])
    print(s)

    Output:

    London 20

    New York 21

    Helsinki 12

    dtype: INT

Accessing Data in a Series

city = ['London', 'New York', 'Helsinki']
s = pd.Series([20, 21, 12], city)
  1. Accessing Data by Position

    • Access a single element by integer position

      s[0] 
      s.iloc[0] 
      s.iat[0]

      Output: 20

    • Access multiple rows by integer positions

      s[[0,2]]
      s.iloc[[0,2]]

      Output:

      London 20

      Helsinki 12

      dtype: INT

    • Access multiple rows by slice

      s[0:3]
      s.iloc[0:3]

      Output:

      London 20

      New York 21

      Helsinki 12

      dtype: INT

      s[:2]
      s.iloc[:2]

      Output:

      London 20

      New York 21

      dtype: INT

      s[1:]
      s.iloc[1:]

      Output:

      New York 21

      Helsinki 12

      dtype: INT

      s[:]
      s.iloc[:]

      Output:

      London 20

      New York 21

      Helsinki 12

      dtype: INT

  2. Accessing Data by Label

    • Access a single element by index label

      s['London']
      s.loc['London']

      Output: 20

    • Access multiple elements by index labels

      s[['London', 'New York']]
      s.loc[['London', 'New York']]

      Output:

      kind

      A 20

      B 21

      dtype: INT

    • Access multiple elements by slice

      s['London':'Helsinki']
      s.loc['London':'Helsinki']

      Output:

      city kind

      London A 20

      New York B 21

      Helsinki A 12

      dtype: INT

      s[:]
      s.loc[:]

      Output:

      city kind

      London A 20

      New York B 21

      Helsinki A 12

      dtype: INT

    Note: Accessing the whole Series using [:] is not supported while using multiple index labels.

    s[:, 'A']
    s.loc[:, 'A']

Operating a Series

  1. Adding elements

    Currently not supported.

  2. Updating elements

    • Update elements directly: Modification through index is supported, while modification through position is not supported.

      s.loc["London"] = 33
      s[1] = 33 // supported
    • Update elements using the update method

      city = ['London', 'New York', 'Helsinki']
      s = pd.Series([20, 21, 12], city)
      s.update(pd.Series([50,60], index=['London', 'Helsinki']))
      s

      Output:

      London 50

      New York 21

      Helsinki 60

      dtype: INT

  3. Removing elements

    • Remove elements using the drop method

      s.drop(["London"])

Series Calculation

Basic calculation method: align elements from different Series based on their indices before calculation. If an index exists in one Series but not in the other, the result for that index will be NaN, such as "Helsinki" and "New York" in the following script.

s=pd.Series([12, 15],["London","New York"])
city = ['London', 'Helsinki']
s1 = pd.Series([33, 25], city) 
s + s1

Output:

Helsinki NaN

London 45

New York NaN

dtype: INT