Class ConventionTransformer

  • All Implemented Interfaces:
    Function<Graph,​Graph>, UnaryOperator<Graph>, GraphTransformer

    public class ConventionTransformer
    extends Object
    implements GraphTransformer
    Transforms a text convention on a source layer into annotations on destination layers.

    Annotations on sourceLayerId are scanned, and where a label matches the sourcePattern regular expression, annotations are added (or modified, in the case of the source layer) on the layers specified by the keys of destinationResults. The values of this collection are used as the labels for annotations added on the corresponding layers. These values can contain groups captured in sourcePattern, in which case the corresponding group content is substituted into the label.

    Some examples:

    To convert words in the format orthography_pos into words with POS tags:

    • sourceLayerId: word
    • sourcePattern: (.+)_(.+)
    • destinationResults:
      • "word" = "$1"
      • "pos" = "$2"
    So a word labelled "the_DT" will end up being labelled "the" and tagged with "DT" on the "pos" layer.

    To convert words prepended with a disfluency marker of & to words with the marker stripped, and tagged with DIS on the "disfluency" layer:

    • sourceLayerId: word
    • sourcePattern: &(.+)
    • destinationResults:
      • "word" = "$1"
      • "disfluency" = "DIS"
    So a word labelled "&th" will end up being labelled "th" and tagged with "DIS" on the "disfluency" layer.

    To convert words in square brackets into noise annotations:

    • sourceLayerId: word
    • sourcePattern: &\[(.+)\]
    • destinationResults:
      • "noise" = "$1"
    So a word labelled "[coughs]" will end up being deleted, replaced by an annotation labelled "coughs" on the "noise" layer. Note that in this case, the destinationResults contains no key for the source layer, so the source annotation is deleted.
    Author:
    Robert Fromont robert@fromont.net.nz
    • Constructor Detail

      • ConventionTransformer

        public ConventionTransformer()
        Default constructor.
      • ConventionTransformer

        public ConventionTransformer​(String sourceLayerId,
                                     String sourcePattern,
                                     HashMap<String,​String> destinationResults)
        Constructor from attribute values.
        Parameters:
        sourceLayerId - Layer ID of the annotations to transform.
        sourcePattern - Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
        destinationResults - A map of layer IDs to label values which may include references to groups captured in the sourcePattern.
      • ConventionTransformer

        public ConventionTransformer​(String sourceLayerId,
                                     String sourcePattern)
        Constructor from attribute values. Destination results must be subsequently added using setDestinationResults(HashMap) or addDestinationResult(String,String).
        Parameters:
        sourceLayerId - Layer ID of the annotations to transform.
        sourcePattern - Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
      • ConventionTransformer

        public ConventionTransformer​(String sourceLayerId,
                                     String sourcePattern,
                                     String sourceResult)
        Constructor from attribute values. Destination results must be subsequently added using setDestinationResults(HashMap) or addDestinationResult(String,String).
        Parameters:
        sourceLayerId - Layer ID of the annotations to transform.
        sourcePattern - Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
        sourceResult - The result on the source layer.
      • ConventionTransformer

        public ConventionTransformer​(String sourceLayerId,
                                     String sourcePattern,
                                     String sourceResult,
                                     String destinationLayerId,
                                     String destinationResult)
        Utility constructor for the common scenario of identifying a pattern on one layer and, where it occurs, changing the label on the source label and adding an annotation on a second layer.

        For example, to tag disfluencies marked with a leading & with a label DIS: new ConventionTransformer("word", "&(.+)", "\\1", "disfluency", "DIS")
        ...which strips the word annotation of the leading &, and tags the word on the "disfluency" layer.

        Parameters:
        sourceLayerId - Layer ID of the annotations to transform.
        sourcePattern - Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
        sourceResult - The result on the source layer.
        destinationLayerId - The ID of the destination layer.
        destinationResult - The result on the destination layer.
    • Method Detail

      • getSourceLayerId

        public String getSourceLayerId()
        Getter for sourceLayerId: Layer ID of the annotations to transform.
        Returns:
        Layer ID of the annotations to transform.
      • setSourceLayerId

        public ConventionTransformer setSourceLayerId​(String newSourceLayerId)
        Setter for sourceLayerId: Layer ID of the annotations to transform.
        Parameters:
        newSourceLayerId - Layer ID of the annotations to transform.
      • getSourcePattern

        public String getSourcePattern()
        Getter for sourcePattern: Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
        Returns:
        Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
      • setSourcePattern

        public ConventionTransformer setSourcePattern​(String newSourcePattern)
        Setter for sourcePattern: Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
        Parameters:
        newSourcePattern - Regular expression in the source layer which triggers transformation of the annotation. This may capture groups, which can be copied into the destination or source layers.
      • getDestinationResults

        public HashMap<String,​String> getDestinationResults()
        Getter for destinationResults: A map of layer IDs to label values which may include references to groups captured in the sourcePattern.
        Returns:
        A map of layer IDs to label values which may include references to groups captured in the sourcePattern.
      • setDestinationResults

        public ConventionTransformer setDestinationResults​(HashMap<String,​String> newDestinationResults)
        Setter for destinationResults: A map of layer IDs to label values which may include references to groups captured in the sourcePattern.
        Parameters:
        newDestinationResults - A map of layer IDs to label values which may include references to groups captured in the sourcePattern.
      • addDestinationResult

        public ConventionTransformer addDestinationResult​(String layerId,
                                                          String label)
        Add a destination result to destinationResults.
        Parameters:
        layerId - The layer on which the annotation will be added. This can be null, in which case no destination is specified, resulting in the annotations being stripped out.
        label - The label for the destination annotation, which may include groups captured in sourcePattern.
        Returns:
        This object.