Class MinimumEditPath<T>

  • Direct Known Subclasses:
    MinimumEditPathString

    public class MinimumEditPath<T>
    extends Object
    Implementation of the Wagner-Fischer algorithm to determine the minimum edit path (or distance) between two sequences. While traditionally this is between two strings (sequences of characters), the implementation uses templates to support sequences of any type, and interfaces for comparators, to support different settings and methods for determining edit distance.

    A traditional Levenstein distance calculation can be achieved with MinimumEditPath<Character>.

    e.g. The edit path between two lists of Integers can be determined using:

     MinimumEditPath<Integer> mp = new MinimumEditPath<Integer>();
     List<EditStep<Integer>> path = mp.minimumEditPath(vFrom, vTo)
     for (EditStep<Integer> step: path) {
       System.out.println("from " + step.getFrom() + " to " + step.getTo() 
                          + " : " + step.getOperation() + " distance " + step.getStepDistance());
     }

    The equality comparison can be customized, e.g.:

     MinimumEditPath<String> mp = new MinimumEditPath<String>(
       new DefaultEditComparator<String>(new EqualsComparator<String>() {
         public int compare(String o1, String o2) {
          return o1.toLowerCase().compareTo(o2.toLowerCase());
         }
        }));
    Author:
    Robert Fromont robert@fromont.net.nz
    • Constructor Detail

      • MinimumEditPath

        public MinimumEditPath()
        Default constructor
      • MinimumEditPath

        public MinimumEditPath​(EditComparator<T> comparator)
        Constructor
        Parameters:
        comparator - Element comparator to use.
    • Method Detail

      • getComparator

        public EditComparator<T> getComparator()
        Getter for comparator: The comparator used when determining the distance between two sequence elements.
        Returns:
        The comparator used when determining the distance between two sequence elements.
      • setComparator

        public MinimumEditPath<T> setComparator​(EditComparator<T> Newcomparator)
        Setter for comparator: The comparator used when determining the distance between two sequence elements.
        Parameters:
        Newcomparator - The comparator used when determining the distance between two sequence elements.
      • minimumEditPath

        public List<EditStep<T>> minimumEditPath​(List<T> from,
                                                 List<T> to)
        Computes the minimum path from one sequence to another.
        Parameters:
        from - The source (original) sequence.
        to - The destination (final) sequence.
        Returns:
        The edit path between the two sequences that has the minimum edit distance.
      • minimumEditDistance

        public double minimumEditDistance​(List<T> from,
                                          List<T> to)
        Computes the minimum edit distance between two sequences.
        Parameters:
        from - The source (original) sequence.
        to - The destination (final) sequence.
        Returns:
        The minimum edit distance between the two sequences.
      • collapse

        public List<EditStep<T>> collapse​(List<EditStep<T>> path)
        Collapses edit path so that subsequent delete/create steps are collapsed into a single change step, except those that would result in a change more than 3 times more costly than a delete and insert.
        Parameters:
        path - The path to collapse
        Returns:
        The given path, with possibly fewer steps.
      • collapse

        public List<EditStep<T>> collapse​(List<EditStep<T>> path,
                                          boolean ifChangeIsReasonable)
        Collapses edit path so that subsequent delete/create steps are collapsed into a single change step.
        Parameters:
        path - The path to collapse
        ifChangeIsReasonable - If true, then any collapsing that would result in a change more than 3 times more costly than a delete and insert will not be collapsed. If false, all insert/delete pairs will be collapsed regardless of cost.
        Returns:
        The given path, with possibly fewer steps.
      • errorRate

        public double errorRate​(List<EditStep<T>> path)
        Return the error rate for the given path.

        Error rate = (S+D+I)/(S+D+C)
        Where:

        • C = number of correct elements
        • S = number of substitutions
        • D = number of deletions
        • I = number of insertions
        Parameters:
        path -
        Returns:
        The error rate.