Package nzilbb.bas

Class BAS


  • public class BAS
    extends Object
    Class exposing the BAS web services API for various speech processing and annotation tasks.

    https://clarin.phonetik.uni-muenchen.de/BASWebServices/#/services

    For service discovery, links are like http://clarin.phonetik.uni-muenchen.de/BASWebServices/BAS_Webservices.cmdi.xml

    The services supported here are:

    • G2P for converting orthographic transcript into phonemic transcription
    • MAUS for forced alignment given a WAV file and a phonemic transcription
    • MAUSBasic combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript
    • Pho2Syl adding syllabification to phonemic transcriptions
    • TTS for transforming a transcript of German text into an audio file (Text-to-Speech)
    • TextAlign for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription.

    To use the API, the code is something like this:

     BAS bas = new BAS();
     BASResponse response = bas.MAUSBasic(
       "eng-NZ", new File("recording.wav"), new File("transcript.txt"));
     if (response.getSuccess()) {
       response.saveDownload(new File("Praat.TextGrid"));
     } else {
       System.out.println(response.getWarnings());
     }
     

    Input files can be supplied using an InputStream or a File. In some cases, a String can also be used as input.

    Author:
    Robert Fromont robert@fromont.net.nz
    • Constructor Detail

      • BAS

        public BAS()
            throws IOException
        Default constructor.
        Throws:
        IOException - If the ISO 639 resources could be loaded.
    • Method Detail

      • getVersion

        public String getVersion()
        Version of the BAS services this API is designed for.
        Returns:
        Version of the BAS services this API is designed for.
      • getMAUSBasicUrl

        public String getMAUSBasicUrl()
        Getter for MAUSBasicUrl: URL for the MAUSBasic service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasic
        Returns:
        URL for the MAUSBasic service.
      • setMAUSBasicUrl

        public void setMAUSBasicUrl​(String newMAUSBasicUrl)
        Setter for MAUSBasicUrl: URL for the MAUSBasic service.
        Parameters:
        newMAUSBasicUrl - URL for the MAUSBasic service.
      • getG2PUrl

        public String getG2PUrl()
        Getter for G2PUrl: URL for the G2P service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runG2P
        Returns:
        URL for the G2P service.
      • setG2PUrl

        public void setG2PUrl​(String newG2PUrl)
        Setter for G2PUrl: URL for the G2P service.
        Parameters:
        newG2PUrl - URL for the G2P service.
      • getMAUSUrl

        public String getMAUSUrl()
        Getter for MAUSUrl: URL for the MAUS service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUS
        Returns:
        URL for the MAUS service.
      • setMAUSUrl

        public void setMAUSUrl​(String newMAUSUrl)
        Setter for MAUSUrl: URL for the MAUS service.
        Parameters:
        newMAUSUrl - URL for the MAUS service.
      • getPho2SylUrl

        public String getPho2SylUrl()
        Getter for Pho2SylUrl: URL from the Pho2Syl service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runPho2Syl
        Returns:
        URL from the Pho2Syl service.
      • setPho2SylUrl

        public void setPho2SylUrl​(String newPho2SylUrl)
        Setter for Pho2SylUrl: URL from the Pho2Syl service.
        Parameters:
        newPho2SylUrl - URL from the Pho2Syl service.
      • getTTSUrl

        public String getTTSUrl()
        Getter for TTSUrl: URL for MaryTTS service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTTSFile
        Returns:
        URL for MaryTTS service.
      • setTTSUrl

        public void setTTSUrl​(String newTTSUrl)
        Setter for TTSUrl: URL for MaryTTS service.
        Parameters:
        newTTSUrl - URL for MaryTTS service.
      • getTextAlignUrl

        public String getTextAlignUrl()
        Getter for TextAlignUrl: URL for the TextAlign service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTextAlign
        Returns:
        URL for the TextAlign service.
      • setTextAlignUrl

        public void setTextAlignUrl​(String newTextAlignUrl)
        Setter for TextAlignUrl: URL for the TextAlign service.
        Parameters:
        newTextAlignUrl - URL for the TextAlign service.
      • MAUSBasic

        public BASResponse MAUSBasic​(String LANGUAGE,
                                     File SIGNAL,
                                     File TEXT)
                              throws IOException,
                                     ParserConfigurationException
        Invokes the MAUSBasic service, which combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript.
        Parameters:
        LANGUAGE - RFC 5646 tag for identifying the language.
        SIGNAL - The signal, in WAV format.
        TEXT - The transcription of the text.
        Returns:
        The result of the call.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • G2P

        public BASResponse G2P​(String lng,
                               String txt,
                               String outsym,
                               String featset,
                               String oform,
                               boolean syl,
                               boolean stress)
                        throws IOException,
                               ParserConfigurationException
        Invokes the G2P service for converting orthography into phonemic transcription.

        This convenience method takes a String as the text, and assumes iform = "txt", use G2P(String,InputStream,String,String,int,String,String,String,boolean,boolean,boolean,boolean,String) for full set of options.

        Parameters:
        lng - RFC 5646 tag for identifying the language.
        txt - The text to transform as a String.
        outsym - Ouput phoneme symbol inventory:
        • "sampa" - language-specific SAMPA variant is the default.
        • "x-sampa" - language independent X-SAMPA and IPA can be chosen.
        • "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
        • "ipa" - Unicode-encoded IPA.
        • "arpabet" - supported for eng-US only
        featset - - Feature set used for grapheme-phoneme conversion.
        • "standard" comprises a letter window centered on the grapheme to be converted.
        • "extended" set additionally includes part of speech and morphological analyses.
        oform - Output format:
        • "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
        • "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
        • "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
        • "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
        • "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
        • "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
        • "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
        • "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
        • With "tg" and "exttg" TextGrid output is produced.
        syl - whether or not word stress is to be added to the output transcription.
        stress - whether or not the output transcription is to be syllabified.
        Returns:
        The result of this call.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • G2P

        public BASResponse G2P​(String lng,
                               File i,
                               String iform,
                               String outsym,
                               String featset,
                               String oform,
                               boolean syl,
                               boolean stress,
                               boolean nrm,
                               boolean com,
                               String align)
                        throws IOException,
                               ParserConfigurationException
        Invokes the G2P service for converting orthography into phonemic transcription.

        This method cannot have iform set to "tg", use G2P(String,InputStream,String,String,int,String,String,String,boolean,boolean,boolean,boolean,String) for full set of options.

        Parameters:
        lng - RFC 5646 tag for identifying the language.
        i - The text to transform.
        iform - The format of i -
        • "txt" indicates connected text input, which will be tokenized before the conversion.
        • "list" indicates a sequence of unconnected words, that does not need to be tokenized. Furthermore, "list" requires a different part-of-speech tagging strategy than "txt" for the extraction of the "extended" feature set (see Parameter featset).
        • "tcf" indicates, that the input format is TCF containing at least a tokenization dominated by the element "tokens".
        • Input format "bpf" indicates BAS partitur file input containing an ORT tier to be transcribed.
        outsym - Ouput phoneme symbol inventory:
        • "sampa" - language-specific SAMPA variant is the default.
        • "x-sampa" - language independent X-SAMPA and IPA can be chosen.
        • "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
        • "ipa" - Unicode-encoded IPA.
        • "arpabet" - supported for eng-US only
        featset - - Feature set used for grapheme-phoneme conversion.
        • "standard" comprises a letter window centered on the grapheme to be converted.
        • "extended" set additionally includes part of speech and morphological analyses.
        oform - Output format:
        • "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
        • "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
        • "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
        • "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
        • "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
        • "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
        • "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
        • "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
        • With "tg" and "exttg" TextGrid output is produced.
        syl - whether or not word stress is to be added to the output transcription.
        stress - whether or not the output transcription is to be syllabified.
        nrm - Detects and expands 22 non-standard word types.
        com - whether <*> strings should be treated as annotation markers. If true, then strings of this type are considered as annotation markers that are not processed but passed on to the output.
        align - "yes", "no", or "sym" decision whether or not the transcription is to be letter-aligned. Syllable boundaries and word stress are not part of the output of this 'sym' alignment.
        Returns:
        The result of this call.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • G2P

        public BASResponse G2P​(String lng,
                               InputStream i,
                               String iform,
                               String outsym,
                               String featset,
                               String oform,
                               boolean syl,
                               boolean stress,
                               boolean nrm,
                               boolean com,
                               String align)
                        throws IOException,
                               ParserConfigurationException
        Invokes the G2P service for converting orthography into phonemic transcription.

        This method cannot have iform set to "tg", use G2P(String,InputStream,String,String,int,String,String,String,boolean,boolean,boolean,boolean,String) for full set of options.

        Parameters:
        lng - RFC 5646 tag for identifying the language.
        i - The text to transform.
        iform - The format of i -
        • "txt" indicates connected text input, which will be tokenized before the conversion.
        • "list" indicates a sequence of unconnected words, that does not need to be tokenized. Furthermore, "list" requires a different part-of-speech tagging strategy than "txt" for the extraction of the "extended" feature set (see Parameter featset).
        • "tcf" indicates, that the input format is TCF containing at least a tokenization dominated by the element "tokens".
        • Input format "bpf" indicates BAS partitur file input containing an ORT tier to be transcribed.
        outsym - Ouput phoneme symbol inventory:
        • "sampa" - language-specific SAMPA variant is the default.
        • "x-sampa" - language independent X-SAMPA and IPA can be chosen.
        • "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
        • "ipa" - Unicode-encoded IPA.
        • "arpabet" - supported for eng-US only
        featset - - Feature set used for grapheme-phoneme conversion.
        • "standard" comprises a letter window centered on the grapheme to be converted.
        • "extended" set additionally includes part of speech and morphological analyses.
        oform - Output format:
        • "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
        • "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
        • "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
        • "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
        • "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
        • "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
        • "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
        • "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
        • With "tg" and "exttg" TextGrid output is produced.
        syl - whether or not word stress is to be added to the output transcription.
        stress - whether or not the output transcription is to be syllabified.
        nrm - Detects and expands 22 non-standard word types.
        com - whether <*> strings should be treated as annotation markers. If true, then strings of this type are considered as annotation markers that are not processed but passed on to the output.
        align - "yes", "no", or "sym" decision whether or not the transcription is to be letter-aligned. Syllable boundaries and word stress are not part of the output of this 'sym' alignment.
        Returns:
        The result of this call.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • G2P

        public BASResponse G2P​(String lng,
                               InputStream i,
                               String iform,
                               String tgitem,
                               int tgrate,
                               String outsym,
                               String featset,
                               String oform,
                               boolean syl,
                               boolean stress,
                               boolean nrm,
                               boolean com,
                               String align)
                        throws IOException,
                               ParserConfigurationException
        Invokes the G2P service for converting orthography into phonemic transcription.
        Parameters:
        lng - RFC 5646 tag for identifying the language.
        i - The text to transform.
        iform - The format of i -
        • "txt" indicates connected text input, which will be tokenized before the conversion.
        • "list" indicates a sequence of unconnected words, that does not need to be tokenized. Furthermore, "list" requires a different part-of-speech tagging strategy than "txt" for the extraction of the "extended" feature set (see Parameter featset).
        • "tg" indicates TextGrid input. Long and short format is supported. For TextGrid input additionally the name of the item containing the words to be transcribed is to be specified by the parameter "tgname". In combination with "bpf" output format "tg" input additionally requires the specification of the sample rate by the parameter "tgrate".
        • "tcf" indicates, that the input format is TCF containing at least a tokenization dominated by the element "tokens".
        • Input format "bpf" indicates BAS partitur file input containing an ORT tier to be transcribed.
        tgitem - Only needed, if iform is "tg". Name of the TextGrid item, that contains the words to be transcribed. In case of TextGrid output, this item is the reference for the added items.
        tgrate - Only needed, if iform is "tg" and oform is "bpf(s)". Sample rate to convert time values from TextGrid to sample values in BAS partiture file.
        outsym - Ouput phoneme symbol inventory:
        • "sampa" - language-specific SAMPA variant is the default.
        • "x-sampa" - language independent X-SAMPA and IPA can be chosen.
        • "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
        • "ipa" - Unicode-encoded IPA.
        • "arpabet" - supported for eng-US only
        featset - - Feature set used for grapheme-phoneme conversion.
        • "standard" comprises a letter window centered on the grapheme to be converted.
        • "extended" set additionally includes part of speech and morphological analyses.
        oform - Output format:
        • "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
        • "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
        • "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
        • "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
        • "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
        • "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
        • "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
        • "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
        • With "tg" and "exttg" TextGrid output is produced.
        syl - whether or not word stress is to be added to the output transcription.
        stress - whether or not the output transcription is to be syllabified.
        nrm - Detects and expands 22 non-standard word types.
        com - whether <*> strings should be treated as annotation markers. If true, then strings of this type are considered as annotation markers that are not processed but passed on to the output.
        align - "yes", "no", or "sym" decision whether or not the transcription is to be letter-aligned. Syllable boundaries and word stress are not part of the output of this 'sym' alignment.
        Returns:
        The result of this call.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • MAUS

        public BASResponse MAUS​(String LANGUAGE,
                                File SIGNAL,
                                File BPF,
                                String OUTFORMAT,
                                String OUTSYMBOL)
                         throws IOException,
                                ParserConfigurationException
        Invoke the general MAUS service, with mostly default options, for forced alignment given a WAV file and a phonemic transcription.
        Parameters:
        LANGUAGE - RFC 5646 tag for identifying the language.
        SIGNAL - The signal, in WAV format.
        BPF - Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.
        OUTFORMAT - Defines the output format:
        • "TextGrid" - a praat compatible TextGrid file
        • "par" or "mau-append" - the input BPF file with a new (or replaced) tier MAU
        • "csv" or "mau" - only the BPF MAU tier (CSV table)
        • "legacyEMU" - a file with extension *.EMU that contains in the first part the Emu hlb file (*.hlb) and in the second part the Emu phonetic segmentation (*.phonetic)
        • "emuR" - an Emu compatible *_annot.json file
        OUTSYMBOL - Defines the encoding of phonetic symbols in output.
        • "sampa" - (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA
        • "ipa" - the service produces UTF-8 IPA output.
        • "manner" - the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective.
        • "place" - the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • MAUS

        public BASResponse MAUS​(String LANGUAGE,
                                File SIGNAL,
                                File BPF,
                                String OUTFORMAT,
                                String OUTSYMBOL,
                                Integer MINPAUSLEN,
                                Integer STARTWORD,
                                Integer ENDWORD,
                                File RULESET,
                                Integer MAUSSHIFT,
                                Double INSPROB,
                                Boolean INSKANTEXTGRID,
                                Boolean INSORTTEXTGRID,
                                Boolean USETRN,
                                Boolean NOINITIALFINALSILENCE,
                                Double WEIGHT,
                                String MODUS)
                         throws IOException,
                                ParserConfigurationException
        Invoke the general MAUS service, for forced alignment given a WAV file and a phonemic transcription.
        Parameters:
        LANGUAGE - RFC 5646 tag for identifying the language.
        SIGNAL - The signal, in WAV format.
        BPF - Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.
        MINPAUSLEN - Controls the behaviour of optional inter-word silence. If set to 1, maus will detect all inter-word silence intervals that can be found (minimum length for a silence interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word silence interval to be detected is set to n×10 msec.
        STARTWORD - If set to a value n>0, this option causes maus to start the segmentation with the word number n (word numbering in BPF starts with 0).
        ENDWORD - If set to a value n<999999, this option causes maus to end the segmentation with the word number n (word numbering in BPF starts with 0).
        RULESET - MAUS rule set file; UTF-8 encoded; one rule per line; two different file types defined by the extension: '*.nrul' : phonological rule set without statistical information
        OUTFORMAT - Defines the output format:
        • "TextGrid" - a praat compatible TextGrid file
        • "par" or "mau-append" - the input BPF file with a new (or replaced) tier MAU
        • "csv" or "mau" - only the BPF MAU tier (CSV table)
        • "legacyEMU" - a file with extension *.EMU that contains in the first part the Emu hlb file (*.hlb) and in the second part the Emu phonetic segmentation (*.phonetic)
        • emuR - an Emu compatible *_annot.json file
        MAUSSHIFT - If set to n, this option causes the calculated MAUS segment boundaries to be shifted by n msec (default: 10) into the future.
        INSPROB - The option INSPROB influences the probability of deletion of segments. It is a constant factor (a constant value added to the log likelihood score) after each segment. Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go up, thus decreasing the probability of deletions (and increasing the probability of insertions, which are rarely modelled in the rule sets).
        INSKANTEXTGRID - Switch to create an additional tier in the TextGrid output file with a word segmentation labelled with the canonic phonemic transcript.
        INSORTTEXTGRID - Switch to create an additional tier ORT in the TextGrid output file with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier)
        USETRN - If set to true, the service searches the input BPF for a TRN tier. The synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g. 'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the segmentation is restricted within a time range given by this TRN tier entry.
        OUTSYMBOL - Defines the encoding of phonetic symbols in output.
        • "sampa" - (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA
        • "ipa" - the service produces UTF-8 IPA output.
        • "manner" - the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective.
        • "place" - the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
        NOINITIALFINALSILENCE - Switch to suppress the automatic modeling on a leading/trailing silence interval.
        WEIGHT - weights the influence of the statistical pronunciation model against the acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log likelihood) before adding the score to the acoustical score within the search. Since the pronunciation model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be selected according to acoustic evidence
        MODUS - Operation modus of MAUS:
        • "standard" (default) - the segmentation and labelling using the MAUS technique as described in Schiel ICPhS 1999.
        • "align" - a forced alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • MAUS

        public BASResponse MAUS​(String LANGUAGE,
                                InputStream SIGNAL,
                                InputStream BPF,
                                String OUTFORMAT,
                                String OUTSYMBOL,
                                Integer MINPAUSLEN,
                                Integer STARTWORD,
                                Integer ENDWORD,
                                InputStream RULESET,
                                Integer MAUSSHIFT,
                                Double INSPROB,
                                Boolean INSKANTEXTGRID,
                                Boolean INSORTTEXTGRID,
                                Boolean USETRN,
                                Boolean NOINITIALFINALSILENCE,
                                Double WEIGHT,
                                String MODUS)
                         throws IOException,
                                ParserConfigurationException
        Invoke the general MAUS service, for forced alignment given a WAV file and a phonemic transcription.
        Parameters:
        LANGUAGE - RFC 5646 tag for identifying the language.
        SIGNAL - The signal, in WAV format.
        BPF - Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.
        MINPAUSLEN - Controls the behaviour of optional inter-word silence. If set to 1, maus will detect all inter-word silence intervals that can be found (minimum length for a silence interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word silence interval to be detected is set to n×10 msec.
        STARTWORD - If set to a value n>0, this option causes maus to start the segmentation with the word number n (word numbering in BPF starts with 0).
        ENDWORD - If set to a value n<999999, this option causes maus to end the segmentation with the word number n (word numbering in BPF starts with 0).
        RULESET - MAUS rule set file; UTF-8 encoded; one rule per line; two different file types defined by the extension: '*.nrul' : phonological rule set without statistical information
        OUTFORMAT - Defines the output format:
        • "TextGrid" - a praat compatible TextGrid file
        • "par" or "mau-append" - the input BPF file with a new (or replaced) tier MAU
        • "csv" or "mau" - only the BPF MAU tier (CSV table)
        • "legacyEMU" - a file with extension *.EMU that contains in the first part the Emu hlb file (*.hlb) and in the second part the Emu phonetic segmentation (*.phonetic)
        • emuR - an Emu compatible *_annot.json file
        MAUSSHIFT - If set to n, this option causes the calculated MAUS segment boundaries to be shifted by n msec (default: 10) into the future.
        INSPROB - The option INSPROB influences the probability of deletion of segments. It is a constant factor (a constant value added to the log likelihood score) after each segment. Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go up, thus decreasing the probability of deletions (and increasing the probability of insertions, which are rarely modelled in the rule sets).
        INSKANTEXTGRID - Switch to create an additional tier in the TextGrid output file with a word segmentation labelled with the canonic phonemic transcript.
        INSORTTEXTGRID - Switch to create an additional tier ORT in the TextGrid output file with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier)
        USETRN - If set to true, the service searches the input BPF for a TRN tier. The synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g. 'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the segmentation is restricted within a time range given by this TRN tier entry.
        OUTSYMBOL - Defines the encoding of phonetic symbols in output.
        • "sampa" - (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA
        • "ipa" - the service produces UTF-8 IPA output.
        • "manner" - the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective.
        • "place" - the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
        NOINITIALFINALSILENCE - Switch to suppress the automatic modeling on a leading/trailing silence interval.
        WEIGHT - weights the influence of the statistical pronunciation model against the acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log likelihood) before adding the score to the acoustical score within the search. Since the pronunciation model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be selected according to acoustic evidence
        MODUS - Operation modus of MAUS:
        • "standard" (default) - the segmentation and labelling using the MAUS technique as described in Schiel ICPhS 1999.
        • "align" - a forced alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • Pho2Syl

        public BASResponse Pho2Syl​(String lng,
                                   File i,
                                   String tier,
                                   Boolean wsync,
                                   String oform,
                                   Integer rate)
                            throws IOException,
                                   ParserConfigurationException
        Invoke the Pho2Syl service to syllabify a phonemic transcription.
        Parameters:
        lng - RFC 5646 tag for identifying the language.
        i - Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.
        tier - Name of tier in the annotation file, whose content is to be syllabified.
        wsync - Whether each word boundary is considered as syllable boundary.
        oform - Output format:
        • "bpf" - BAS Partiture format
        • "tg" - TextGrid format
        rate - Only needed if oform = "tg" (TextGrid); Sample rate to convert sample values from BAS partiture file to seconds in TextGrid.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • Pho2Syl

        public BASResponse Pho2Syl​(String lng,
                                   InputStream i,
                                   String tier,
                                   Boolean wsync,
                                   String oform,
                                   Integer rate)
                            throws IOException,
                                   ParserConfigurationException
        Invoke the Pho2Syl service to syllabify a phonemic transcription.
        Parameters:
        lng - RFC 5646 tag for identifying the language.
        i - Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.
        tier - Name of tier in the annotation file, whose content is to be syllabified.
        wsync - Whether each word boundary is considered as syllable boundary.
        oform - Output format:
        • "bpf" - BAS Partiture format
        • "tg" - TextGrid format
        rate - Only needed if oform = "tg" (TextGrid); Sample rate to convert sample values from BAS partiture file to seconds in TextGrid.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • TTS

        public BASResponse TTS​(String INPUT_TYPE,
                               String INPUT_TEXT,
                               String OUTPUT_TYPE,
                               String AUDIO,
                               String VOICE)
                        throws IOException,
                               ParserConfigurationException
        Invoke the MaryTTS German Text-to-speech service.
        Parameters:
        INPUT_TYPE - One of:
        • "TEXT"
        • "SIMPLEPHONEMES"
        • "SABLE"
        • "SSML"
        • "APML"
        • "PHONEMES"
        • "INTONATION"
        • "ACOUSTPARAMS"
        • "RAWMARYXML"
        • "TOKENS"
        • "WORDS"
        • "ALLOPHONES"
        • "REALISED_ACOUSTPARAMS"
        • "REALISED_DURATIONS"
        • "PRAAT_TEXTGRID"
        • "PARTSOFSPEECH"
        INPUT_TEXT - The text input.
        OUTPUT_TYPE - One of:
        • "PHONEMES"
        • "INTONATION"
        • "ACOUSTPARAMS"
        • "RAWMARYXML"
        • "TOKENS"
        • "WORDS"
        • "ALLOPHONES"
        • "REALISED_ACOUSTPARAMS"
        • "REALISED_DURATIONS"
        • "PRAAT_TEXTGRID"
        • "PARTSOFSPEECH"
        • "AUDIO"
        • "HALFPHONE_TARGETFEATURES"
        AUDIO - If OUTPUT_TYPE = "AUDIO", this can be one of:
        • "WAVE_FILE"
        • "AU_FILE"
        • "AIFF_FILE"
        VOICE - One of:
        • "bits4unitselautolabel"
        • "bits3unitselautolabel"
        • "bits3"
        • "bits2unitselautolabel"
        • "bits1unitselautolabel"
        • "bits4unitselautolabelhmm"
        • "bits3unitselautolabelhmm"
        • "bits3-hsmm"
        • "bits2unitselautolabelhmm"
        • "bits1unitselautolabelhmm"
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • TTS

        public BASResponse TTS​(String INPUT_TYPE,
                               File INPUT_TEXT,
                               String OUTPUT_TYPE,
                               String AUDIO,
                               String VOICE)
                        throws IOException,
                               ParserConfigurationException
        Invoke the MaryTTS German Text-to-speech service.
        Parameters:
        INPUT_TYPE - One of:
        • "TEXT"
        • "SIMPLEPHONEMES"
        • "SABLE"
        • "SSML"
        • "APML"
        • "PHONEMES"
        • "INTONATION"
        • "ACOUSTPARAMS"
        • "RAWMARYXML"
        • "TOKENS"
        • "WORDS"
        • "ALLOPHONES"
        • "REALISED_ACOUSTPARAMS"
        • "REALISED_DURATIONS"
        • "PRAAT_TEXTGRID"
        • "PARTSOFSPEECH"
        INPUT_TEXT - The text input.
        OUTPUT_TYPE - One of:
        • "PHONEMES"
        • "INTONATION"
        • "ACOUSTPARAMS"
        • "RAWMARYXML"
        • "TOKENS"
        • "WORDS"
        • "ALLOPHONES"
        • "REALISED_ACOUSTPARAMS"
        • "REALISED_DURATIONS"
        • "PRAAT_TEXTGRID"
        • "PARTSOFSPEECH"
        • "AUDIO"
        • "HALFPHONE_TARGETFEATURES"
        AUDIO - If OUTPUT_TYPE = "AUDIO", this can be one of:
        • "WAVE_FILE"
        • "AU_FILE"
        • "AIFF_FILE"
        VOICE - One of:
        • "bits4unitselautolabel"
        • "bits3unitselautolabel"
        • "bits3"
        • "bits2unitselautolabel"
        • "bits1unitselautolabel"
        • "bits4unitselautolabelhmm"
        • "bits3unitselautolabelhmm"
        • "bits3-hsmm"
        • "bits2unitselautolabelhmm"
        • "bits1unitselautolabelhmm"
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • TTS

        public BASResponse TTS​(String INPUT_TYPE,
                               InputStream INPUT_TEXT,
                               String OUTPUT_TYPE,
                               String AUDIO,
                               String VOICE)
                        throws IOException,
                               ParserConfigurationException
        Invoke the MaryTTS German Text-to-speech service.
        Parameters:
        INPUT_TYPE - One of:
        • "TEXT"
        • "SIMPLEPHONEMES"
        • "SABLE"
        • "SSML"
        • "APML"
        • "PHONEMES"
        • "INTONATION"
        • "ACOUSTPARAMS"
        • "RAWMARYXML"
        • "TOKENS"
        • "WORDS"
        • "ALLOPHONES"
        • "REALISED_ACOUSTPARAMS"
        • "REALISED_DURATIONS"
        • "PRAAT_TEXTGRID"
        • "PARTSOFSPEECH"
        INPUT_TEXT - The text input.
        OUTPUT_TYPE - One of:
        • "PHONEMES"
        • "INTONATION"
        • "ACOUSTPARAMS"
        • "RAWMARYXML"
        • "TOKENS"
        • "WORDS"
        • "ALLOPHONES"
        • "REALISED_ACOUSTPARAMS"
        • "REALISED_DURATIONS"
        • "PRAAT_TEXTGRID"
        • "PARTSOFSPEECH"
        • "AUDIO"
        • "HALFPHONE_TARGETFEATURES"
        AUDIO - If OUTPUT_TYPE = "AUDIO", this can be one of:
        • "WAVE_FILE"
        • "AU_FILE"
        • "AIFF_FILE"
        VOICE - One of:
        • "bits4unitselautolabel"
        • "bits3unitselautolabel"
        • "bits3"
        • "bits2unitselautolabel"
        • "bits1unitselautolabel"
        • "bits4unitselautolabelhmm"
        • "bits3unitselautolabelhmm"
        • "bits3-hsmm"
        • "bits2unitselautolabelhmm"
        • "bits1unitselautolabelhmm"
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be configured.
      • TextAlign

        public BASResponse TextAlign​(File i,
                                     String cost)
                              throws IOException,
                                     ParserConfigurationException
        Convenience method to invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription, using no cost file, and default options for
        displc
        and
        dir
        .
        Parameters:
        i - CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s.
        cost - Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment.
        • "naive" - assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x').
        • "g2p_deu", "g2p_eng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3.
        • "intrinsic" - a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function.
        • "import" - the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be
      • TextAlign

        public BASResponse TextAlign​(File i,
                                     String cost,
                                     File costfile,
                                     Boolean displc,
                                     String atype)
                              throws IOException,
                                     ParserConfigurationException
        Invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription.
        Parameters:
        i - CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s.
        cost - Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment.
        • "naive" - assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x').
        • "g2p_deu", "g2p_eng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3.
        • "intrinsic" - a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function.
        • "import" - the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'.
        costfile - CSV text file with three semicolon-separated columns. Each row contains three columns of the form a;b;c, where c denotes the cost for substituting a by b. Insertion and deletion are are marked by an underscore.
        displc - whether alignment costs should be displayed in a third column in the output file.
        atype - Alignment type:
        • "dir" - align the second column to the first.
        • "sym" symmetric alignment.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be
      • TextAlign

        public BASResponse TextAlign​(InputStream i,
                                     String cost,
                                     InputStream costfile,
                                     Boolean displc,
                                     String atype)
                              throws IOException,
                                     ParserConfigurationException
        Invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription.
        Parameters:
        i - CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s.
        cost - Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment.
        • "naive" - assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x').
        • "g2p_deu", "g2p_eng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3.
        • "intrinsic" - a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function.
        • "import" - the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'.
        costfile - CSV text file with three semicolon-separated columns. Each row contains three columns of the form a;b;c, where c denotes the cost for substituting a by b. Insertion and deletion are are marked by an underscore.
        displc - whether alignment costs should be displayed in a third column in the output file.
        atype - Alignment type:
        • "dir" - align the second column to the first.
        • "sym" symmetric alignment.
        Returns:
        The response to the request.
        Throws:
        IOException - If an IO error occurs.
        ParserConfigurationException - If the XML parser for parsing the response could not be