encodeShortGenomeSeq

Syntax

encodeShortGenomeSeq(X)

Alias: encodeSGS

Arguments

X is a scalar/vector of STRING/CHAR type.

Details

encodeShortGenomeSeq encodes DNA sequences made up of A, T, C, G letters. The encoding can reduce the storage space needed for DNA sequences and improve performance.

Note:
  • When X is an empty string (""), the function returns 0.

  • When X contains any character other than A, T, C, G (case-sensitive), the function returns NULL.

  • When the length of X exceeds 28 characters, the function returns NULL.

Return Value: LONG or FAST LONG vector

Examples

a=encodeShortGenomeSeq("TCGATCG")
a;
// output
465691
typestr(a)
// output
LONG

b=encodeShortGenomeSeq("TCGATCG" "TCGATCGCCC")
// output
[465691,168216298]
typestr(b)
// output
FAST LONG VECTOR

//NULL is returned as the input exceeds 28 characters after "TCGATCG" is repeated 5 times.
encodeShortGenomeSeq(repeat("TCGATCG" "TCGAT", 5))
// output
[,1801916404867712433]

y=toCharArray("TCGATCGCCC")
encodeShortGenomeSeq(y)
// output
168216298

//NULL is returned in the following cases
encodeShortGenomeSeq("TC G")
encodeShortGenomeSeq("TCtG")
encodeShortGenomeSeq("NNNNNNNNTCGGGGCAT")
encodeShortGenomeSeq("TCGGGGCATNGCCCG")
encodeShortGenomeSeq("GCCCGATNNNNN")

Related functions: decodeShortGenomeSeq, genShortGenomeSeq