genShortGenomeSeq

Syntax

genShortGenomeSeq(X, window)

Alias: genSGS

Details

This function slides a window of fixed size (based on the number of characters) over the input DNA sequence. It encodes the characters in each window and returns an integral vector containing the encoded values.

Note:

  • This function adopts a forward sliding window approach, starting from the first character of the sequence. The sliding window moves by one character at a time. It first takes the current character, then the next character, continuing until window characters are included.

  • If window exceeds the total length of X, an empty integral vector is returned.

Parameters

X is a STRING scalar or CHAR vector.

window is a positive integer in [2,28].

Returns

Returns an integer vector whose length is equal to the number of characters in X, as specified below:

window Range Return Type
[2,4] FAST SHORT VECTOR
[5,12] FAST INT VECTOR
[13,28] FAST LONG VECTOR

Examples

genShortGenomeSeq("NNNNNNNNTCGGGGCAT",3)
// output: [,,,,,,,,795,815,831,831,830,824,801,,]

genShortGenomeSeq("TCGGGGCATNGCCCG",4)
// output: [1135,1215,1279,1278,1272,1249,,,,,1258,1195,,,]

genShortGenomeSeq("GCCCGATNNNNN",6)
// output: [396972,395953,,,,,,,,,,]

genShortGenomeSeq("TCGATCGTCGATCGTCGATCGTCGATCGG",5)
// output: [328113,328390,328475,327789,328118,328411,328556,328113,328390,328475,327789,328118,328411,328556,328113,328390,328475,327789,328118,328411,328556,328113,328390,328475,327791,,,,]

genShortGenomeSeq("ACTT",8)
// output: [,,,]

Related functions: encodeShortGenomeSeq, decodeShortGenomeSeq