seqDist - Calculate distance between two sequences


seqDist calculates the distance between two DNA sequences.


seqDist(seq1, seq2, dist_mat = getDNAMatrix())


character string containing a DNA sequence.
character string containing a DNA sequence.
Character distance matrix. Defaults to a Hamming distance matrix returned by getDNAMatrix. If gap characters, c("-", "."), are assigned a value of -1 in dist_mat then contiguous gaps of any run length, which are not present in both sequences, will be counted as a distance of 1. Meaning, indels of any length will increase the sequence distance by 1. Gap values other than -1 will return a distance that does not consider indels as a special case.


Numerical distance between seq1 and seq2.


# Ungapped examples
seqDist("ATGGC", "ATGGG")

[1] 1

seqDist("ATGGC", "ATG??")

[1] 2

# Gaps will be treated as Ns with a gap=0 distance matrix
seqDist("ATGGC", "AT--C", dist_mat=getDNAMatrix(gap=0))

[1] 0

# Gaps will be treated as universally non-matching characters with gap=1
seqDist("ATGGC", "AT--C", dist_mat=getDNAMatrix(gap=1))

[1] 2

# Gaps of any length will be treated as single mismatches with a gap=-1 distance matrix
seqDist("ATGGC", "AT--C", dist_mat=getDNAMatrix(gap=-1))

[1] 1

# Gaps of equivalent run lengths are not counted as gaps
seqDist("ATG-C", "ATG-C", dist_mat=getDNAMatrix(gap=-1))

[1] 0

# Overlapping runs of gap characters are counted as a single gap
seqDist("ATG-C", "AT--C", dist_mat=getDNAMatrix(gap=-1))

[1] 1

seqDist("A-GGC", "AT--C", dist_mat=getDNAMatrix(gap=-1))

[1] 1

seqDist("AT--C", "AT--C", dist_mat=getDNAMatrix(gap=-1))

[1] 0

# Discontiguous runs of gap characters each count as separate gaps
seqDist("-TGGC", "AT--C", dist_mat=getDNAMatrix(gap=-1))
[1] 2

See also

Nucleotide distance matrix may be built with getDNAMatrix. Amino acid distance matrix may be built with getAAMatrix. Used by pairwiseDist for generating distance matrices. See seqEqual for testing sequence equivalence.