padSeqEnds - Pads ragged ends of aligned DNA sequences

Description

padSeqEnds takes a vector of DNA sequences, as character strings, and appends the ends of each sequence with an appropriate number of "N" characters to create a sequence vector with uniform lengths.

Usage

padSeqEnds(seq, len = NULL, start = FALSE, pad_char = "N", mod3 = TRUE)

Arguments

seq
character vector of DNA sequence strings.
len
length to pad to. Only applies if longer than the maximum length of the data in seq.
start
if TRUE pad the beginning of each sequence instead of the end.
pad_char
character to use for padding.
mod3
if TRUE pad sequences to be of length multiple three.

Value

A modified seq vector with padded sequences.

Examples

# Default behavior uniformly pads ragged ends
seq <- c("CCCCTGGG", "ACCCTG", "CCCC")
padSeqEnds(seq)

[1] "CCCCTGGGN" "ACCCTGNNN" "CCCCNNNNN"


# Pad to fixed length
padSeqEnds(seq, len=15)

[1] "CCCCTGGGNNNNNNN" "ACCCTGNNNNNNNNN" "CCCCNNNNNNNNNNN"


# Add padding to the beginning of the sequences instead of the ends
padSeqEnds(seq, start=TRUE)

[1] "NCCCCTGGG" "NNNACCCTG" "NNNNNCCCC"

padSeqEnds(seq, len=15, start=TRUE)
[1] "NNNNNNNCCCCTGGG" "NNNNNNNNNACCCTG" "NNNNNNNNNNNCCCC"

See also

See maskSeqEnds for creating uniform masking from existing masking.