Class BAS
- java.lang.Object
-
- nzilbb.bas.BAS
-
public class BAS extends Object
Class exposing the BAS web services API for various speech processing and annotation tasks.https://clarin.phonetik.uni-muenchen.de/BASWebServices/#/services
For service discovery, links are like http://clarin.phonetik.uni-muenchen.de/BASWebServices/BAS_Webservices.cmdi.xml
The services supported here are:
- G2P for converting orthographic transcript into phonemic transcription
- MAUS for forced alignment given a WAV file and a phonemic transcription
- MAUSBasic combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript
- Pho2Syl adding syllabification to phonemic transcriptions
- TTS for transforming a transcript of German text into an audio file (Text-to-Speech)
- TextAlign for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription.
To use the API, the code is something like this:
BAS bas = new BAS();
BASResponseresponse = bas.MAUSBasic( "eng-NZ", new File("recording.wav"), new File("transcript.txt")); if (response.getSuccess()) { response.saveDownload(new File("Praat.TextGrid")); } else { System.out.println(response.getWarnings()); }Input files can be supplied using an InputStream or a File. In some cases, a String can also be used as input.
- Author:
- Robert Fromont robert@fromont.net.nz
-
-
Constructor Summary
Constructors Constructor Description BAS()Default constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description BASResponseG2P(String lng, File i, String iform, String outsym, String featset, String oform, boolean syl, boolean stress, boolean nrm, boolean com, String align)Invokes the G2P service for converting orthography into phonemic transcription.BASResponseG2P(String lng, InputStream i, String iform, String tgitem, int tgrate, String outsym, String featset, String oform, boolean syl, boolean stress, boolean nrm, boolean com, String align)Invokes the G2P service for converting orthography into phonemic transcription.BASResponseG2P(String lng, InputStream i, String iform, String outsym, String featset, String oform, boolean syl, boolean stress, boolean nrm, boolean com, String align)Invokes the G2P service for converting orthography into phonemic transcription.BASResponseG2P(String lng, String txt, String outsym, String featset, String oform, boolean syl, boolean stress)Invokes the G2P service for converting orthography into phonemic transcription.StringgetG2PUrl()Getter forG2PUrl: URL for the G2P service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runG2PStringgetMAUSBasicUrl()Getter forMAUSBasicUrl: URL for the MAUSBasic service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasicStringgetMAUSUrl()Getter forMAUSUrl: URL for the MAUS service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSStringgetPho2SylUrl()Getter forPho2SylUrl: URL from the Pho2Syl service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runPho2SylStringgetTextAlignUrl()Getter forTextAlignUrl: URL for the TextAlign service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTextAlignStringgetTTSUrl()Getter forTTSUrl: URL for MaryTTS service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTTSFileStringgetVersion()Version of the BAS services this API is designed for.BASResponseMAUS(String LANGUAGE, File SIGNAL, File BPF, String OUTFORMAT, String OUTSYMBOL)Invoke the general MAUS service, with mostly default options, for forced alignment given a WAV file and a phonemic transcription.BASResponseMAUS(String LANGUAGE, File SIGNAL, File BPF, String OUTFORMAT, String OUTSYMBOL, Integer MINPAUSLEN, Integer STARTWORD, Integer ENDWORD, File RULESET, Integer MAUSSHIFT, Double INSPROB, Boolean INSKANTEXTGRID, Boolean INSORTTEXTGRID, Boolean USETRN, Boolean NOINITIALFINALSILENCE, Double WEIGHT, String MODUS)Invoke the general MAUS service, for forced alignment given a WAV file and a phonemic transcription.BASResponseMAUS(String LANGUAGE, InputStream SIGNAL, InputStream BPF, String OUTFORMAT, String OUTSYMBOL, Integer MINPAUSLEN, Integer STARTWORD, Integer ENDWORD, InputStream RULESET, Integer MAUSSHIFT, Double INSPROB, Boolean INSKANTEXTGRID, Boolean INSORTTEXTGRID, Boolean USETRN, Boolean NOINITIALFINALSILENCE, Double WEIGHT, String MODUS)Invoke the general MAUS service, for forced alignment given a WAV file and a phonemic transcription.BASResponseMAUSBasic(String LANGUAGE, File SIGNAL, File TEXT)Invokes the MAUSBasic service, which combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript.BASResponseMAUSBasic(String LANGUAGE, InputStream SIGNAL, InputStream TEXT)Invokes the MAUSBasic service, which combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript.BASResponsePho2Syl(String lng, File i, String tier, Boolean wsync, String oform, Integer rate)Invoke the Pho2Syl service to syllabify a phonemic transcription.BASResponsePho2Syl(String lng, InputStream i, String tier, Boolean wsync, String oform, Integer rate)Invoke the Pho2Syl service to syllabify a phonemic transcription.voidsetG2PUrl(String newG2PUrl)Setter forG2PUrl: URL for the G2P service.voidsetMAUSBasicUrl(String newMAUSBasicUrl)Setter forMAUSBasicUrl: URL for the MAUSBasic service.voidsetMAUSUrl(String newMAUSUrl)Setter forMAUSUrl: URL for the MAUS service.voidsetPho2SylUrl(String newPho2SylUrl)Setter forPho2SylUrl: URL from the Pho2Syl service.voidsetTextAlignUrl(String newTextAlignUrl)Setter forTextAlignUrl: URL for the TextAlign service.voidsetTTSUrl(String newTTSUrl)Setter forTTSUrl: URL for MaryTTS service.BASResponseTextAlign(File i, String cost)Convenience method to invoke the TextAlign service for aligning two representations of text, e.g.BASResponseTextAlign(File i, String cost, File costfile, Boolean displc, String atype)Invoke the TextAlign service for aligning two representations of text, e.g.BASResponseTextAlign(InputStream i, String cost, InputStream costfile, Boolean displc, String atype)Invoke the TextAlign service for aligning two representations of text, e.g.BASResponseTTS(String INPUT_TEXT)Convenience method to invoke the MaryTTS German Text-to-speech service with plain text input, with a WAV file as output, using the default voice.BASResponseTTS(String INPUT_TYPE, File INPUT_TEXT, String OUTPUT_TYPE, String AUDIO, String VOICE)Invoke the MaryTTS German Text-to-speech service.BASResponseTTS(String INPUT_TYPE, InputStream INPUT_TEXT, String OUTPUT_TYPE, String AUDIO, String VOICE)Invoke the MaryTTS German Text-to-speech service.BASResponseTTS(String INPUT_TYPE, String INPUT_TEXT, String OUTPUT_TYPE, String AUDIO, String VOICE)Invoke the MaryTTS German Text-to-speech service.
-
-
-
Constructor Detail
-
BAS
public BAS() throws IOExceptionDefault constructor.- Throws:
IOException- If the ISO 639 resources could be loaded.
-
-
Method Detail
-
getVersion
public String getVersion()
Version of the BAS services this API is designed for.- Returns:
- Version of the BAS services this API is designed for.
-
getMAUSBasicUrl
public String getMAUSBasicUrl()
Getter forMAUSBasicUrl: URL for the MAUSBasic service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasic- Returns:
- URL for the MAUSBasic service.
-
setMAUSBasicUrl
public void setMAUSBasicUrl(String newMAUSBasicUrl)
Setter forMAUSBasicUrl: URL for the MAUSBasic service.- Parameters:
newMAUSBasicUrl- URL for the MAUSBasic service.
-
getG2PUrl
public String getG2PUrl()
Getter forG2PUrl: URL for the G2P service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runG2P- Returns:
- URL for the G2P service.
-
setG2PUrl
public void setG2PUrl(String newG2PUrl)
Setter forG2PUrl: URL for the G2P service.- Parameters:
newG2PUrl- URL for the G2P service.
-
getMAUSUrl
public String getMAUSUrl()
Getter forMAUSUrl: URL for the MAUS service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUS- Returns:
- URL for the MAUS service.
-
setMAUSUrl
public void setMAUSUrl(String newMAUSUrl)
Setter forMAUSUrl: URL for the MAUS service.- Parameters:
newMAUSUrl- URL for the MAUS service.
-
getPho2SylUrl
public String getPho2SylUrl()
Getter forPho2SylUrl: URL from the Pho2Syl service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runPho2Syl- Returns:
- URL from the Pho2Syl service.
-
setPho2SylUrl
public void setPho2SylUrl(String newPho2SylUrl)
Setter forPho2SylUrl: URL from the Pho2Syl service.- Parameters:
newPho2SylUrl- URL from the Pho2Syl service.
-
getTTSUrl
public String getTTSUrl()
Getter forTTSUrl: URL for MaryTTS service - default: https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTTSFile- Returns:
- URL for MaryTTS service.
-
setTTSUrl
public void setTTSUrl(String newTTSUrl)
Setter forTTSUrl: URL for MaryTTS service.- Parameters:
newTTSUrl- URL for MaryTTS service.
-
getTextAlignUrl
public String getTextAlignUrl()
Getter forTextAlignUrl: URL for the TextAlign service - default: http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTextAlign- Returns:
- URL for the TextAlign service.
-
setTextAlignUrl
public void setTextAlignUrl(String newTextAlignUrl)
Setter forTextAlignUrl: URL for the TextAlign service.- Parameters:
newTextAlignUrl- URL for the TextAlign service.
-
MAUSBasic
public BASResponse MAUSBasic(String LANGUAGE, File SIGNAL, File TEXT) throws IOException, ParserConfigurationException
Invokes the MAUSBasic service, which combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript.- Parameters:
LANGUAGE- RFC 5646 tag for identifying the language.SIGNAL- The signal, in WAV format.TEXT- The transcription of the text.- Returns:
- The result of the call.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
MAUSBasic
public BASResponse MAUSBasic(String LANGUAGE, InputStream SIGNAL, InputStream TEXT) throws IOException, ParserConfigurationException
Invokes the MAUSBasic service, which combines G2P and MAUS for forced alignment given a WAV file and a plain text orthrogaphic transcript.- Parameters:
LANGUAGE- RFC 5646 tag for identifying the language.SIGNAL- The signal, in WAV format.TEXT- The transcription of the text.- Returns:
- The result of the call.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
G2P
public BASResponse G2P(String lng, String txt, String outsym, String featset, String oform, boolean syl, boolean stress) throws IOException, ParserConfigurationException
Invokes the G2P service for converting orthography into phonemic transcription.This convenience method takes a String as the text, and assumes iform = "txt", use
G2P(String,InputStream,String,String,int,String,String,String,boolean,boolean,boolean,boolean,String)for full set of options.- Parameters:
lng- RFC 5646 tag for identifying the language.txt- The text to transform as a String.outsym- Ouput phoneme symbol inventory:- "sampa" - language-specific SAMPA variant is the default.
- "x-sampa" - language independent X-SAMPA and IPA can be chosen.
- "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
- "ipa" - Unicode-encoded IPA.
- "arpabet" - supported for eng-US only
featset- - Feature set used for grapheme-phoneme conversion.- "standard" comprises a letter window centered on the grapheme to be converted.
- "extended" set additionally includes part of speech and morphological analyses.
oform- Output format:- "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
- "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
- "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
- "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
- "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
- "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
- "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
- "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
- With "tg" and "exttg" TextGrid output is produced.
syl- whether or not word stress is to be added to the output transcription.stress- whether or not the output transcription is to be syllabified.- Returns:
- The result of this call.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
G2P
public BASResponse G2P(String lng, File i, String iform, String outsym, String featset, String oform, boolean syl, boolean stress, boolean nrm, boolean com, String align) throws IOException, ParserConfigurationException
Invokes the G2P service for converting orthography into phonemic transcription.This method cannot have iform set to "tg", use
G2P(String,InputStream,String,String,int,String,String,String,boolean,boolean,boolean,boolean,String)for full set of options.- Parameters:
lng- RFC 5646 tag for identifying the language.i- The text to transform.iform- The format of i -- "txt" indicates connected text input, which will be tokenized before the conversion.
- "list" indicates a sequence of unconnected words, that does not need to be tokenized. Furthermore, "list" requires a different part-of-speech tagging strategy than "txt" for the extraction of the "extended" feature set (see Parameter featset).
- "tcf" indicates, that the input format is TCF containing at least a tokenization dominated by the element "tokens".
- Input format "bpf" indicates BAS partitur file input containing an ORT tier to be transcribed.
outsym- Ouput phoneme symbol inventory:- "sampa" - language-specific SAMPA variant is the default.
- "x-sampa" - language independent X-SAMPA and IPA can be chosen.
- "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
- "ipa" - Unicode-encoded IPA.
- "arpabet" - supported for eng-US only
featset- - Feature set used for grapheme-phoneme conversion.- "standard" comprises a letter window centered on the grapheme to be converted.
- "extended" set additionally includes part of speech and morphological analyses.
oform- Output format:- "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
- "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
- "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
- "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
- "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
- "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
- "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
- "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
- With "tg" and "exttg" TextGrid output is produced.
syl- whether or not word stress is to be added to the output transcription.stress- whether or not the output transcription is to be syllabified.nrm- Detects and expands 22 non-standard word types.com- whether <*> strings should be treated as annotation markers. If true, then strings of this type are considered as annotation markers that are not processed but passed on to the output.align- "yes", "no", or "sym" decision whether or not the transcription is to be letter-aligned. Syllable boundaries and word stress are not part of the output of this 'sym' alignment.- Returns:
- The result of this call.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
G2P
public BASResponse G2P(String lng, InputStream i, String iform, String outsym, String featset, String oform, boolean syl, boolean stress, boolean nrm, boolean com, String align) throws IOException, ParserConfigurationException
Invokes the G2P service for converting orthography into phonemic transcription.This method cannot have iform set to "tg", use
G2P(String,InputStream,String,String,int,String,String,String,boolean,boolean,boolean,boolean,String)for full set of options.- Parameters:
lng- RFC 5646 tag for identifying the language.i- The text to transform.iform- The format of i -- "txt" indicates connected text input, which will be tokenized before the conversion.
- "list" indicates a sequence of unconnected words, that does not need to be tokenized. Furthermore, "list" requires a different part-of-speech tagging strategy than "txt" for the extraction of the "extended" feature set (see Parameter featset).
- "tcf" indicates, that the input format is TCF containing at least a tokenization dominated by the element "tokens".
- Input format "bpf" indicates BAS partitur file input containing an ORT tier to be transcribed.
outsym- Ouput phoneme symbol inventory:- "sampa" - language-specific SAMPA variant is the default.
- "x-sampa" - language independent X-SAMPA and IPA can be chosen.
- "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
- "ipa" - Unicode-encoded IPA.
- "arpabet" - supported for eng-US only
featset- - Feature set used for grapheme-phoneme conversion.- "standard" comprises a letter window centered on the grapheme to be converted.
- "extended" set additionally includes part of speech and morphological analyses.
oform- Output format:- "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
- "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
- "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
- "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
- "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
- "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
- "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
- "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
- With "tg" and "exttg" TextGrid output is produced.
syl- whether or not word stress is to be added to the output transcription.stress- whether or not the output transcription is to be syllabified.nrm- Detects and expands 22 non-standard word types.com- whether <*> strings should be treated as annotation markers. If true, then strings of this type are considered as annotation markers that are not processed but passed on to the output.align- "yes", "no", or "sym" decision whether or not the transcription is to be letter-aligned. Syllable boundaries and word stress are not part of the output of this 'sym' alignment.- Returns:
- The result of this call.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
G2P
public BASResponse G2P(String lng, InputStream i, String iform, String tgitem, int tgrate, String outsym, String featset, String oform, boolean syl, boolean stress, boolean nrm, boolean com, String align) throws IOException, ParserConfigurationException
Invokes the G2P service for converting orthography into phonemic transcription.- Parameters:
lng- RFC 5646 tag for identifying the language.i- The text to transform.iform- The format of i -- "txt" indicates connected text input, which will be tokenized before the conversion.
- "list" indicates a sequence of unconnected words, that does not need to be tokenized. Furthermore, "list" requires a different part-of-speech tagging strategy than "txt" for the extraction of the "extended" feature set (see Parameter featset).
- "tg" indicates TextGrid input. Long and short format is supported. For TextGrid input additionally the name of the item containing the words to be transcribed is to be specified by the parameter "tgname". In combination with "bpf" output format "tg" input additionally requires the specification of the sample rate by the parameter "tgrate".
- "tcf" indicates, that the input format is TCF containing at least a tokenization dominated by the element "tokens".
- Input format "bpf" indicates BAS partitur file input containing an ORT tier to be transcribed.
tgitem- Only needed, if iform is "tg". Name of the TextGrid item, that contains the words to be transcribed. In case of TextGrid output, this item is the reference for the added items.tgrate- Only needed, if iform is "tg" and oform is "bpf(s)". Sample rate to convert time values from TextGrid to sample values in BAS partiture file.outsym- Ouput phoneme symbol inventory:- "sampa" - language-specific SAMPA variant is the default.
- "x-sampa" - language independent X-SAMPA and IPA can be chosen.
- "maus-sampa" - maps the output to a language-specific phoneme subset that WEBMAUS can process.
- "ipa" - Unicode-encoded IPA.
- "arpabet" - supported for eng-US only
featset- - Feature set used for grapheme-phoneme conversion.- "standard" comprises a letter window centered on the grapheme to be converted.
- "extended" set additionally includes part of speech and morphological analyses.
oform- Output format:- "bpf" indicates the BAS Partitur Format (BPF) file with a KAN tier.
- "bpfs" differs from "bpf" only in that respect, that the phonemes are separated by blanks. In case of TextGrid input, both "bpf" and "bpfs" require the additional parameters "tgrate" and "tgitem". The content of the TextGrid tier "tgitem" is stored as a word chunk segmentation in the partiture tier TRN.
- "txt" indicates a replacement of the input words by their transcriptions; single line output without punctuation, where phonemes are separated by blanks and words by tabulators.
- "tab" returns the grapheme phoneme conversion result in form of a table with two columns. The first column comprises the words, the second column their blank-separated transcriptions.
- "exttab" results in a 5-column table. The columns contain from left to right: words, transcriptions, part of speech, morpheme segmentations, and morpheme class segmentations.
- "lex" transforms the table to a lexicon, i.e. words are unique and sorted.
- "extlex" provides the same information as "exttab" in a unique and sorted manner. For all lex and tab outputs columns are separated by ';'.
- "exttcf" which is currently available for German and English only additionally adds part of speech (STTS tagset), morphs, and morph classes.
- With "tg" and "exttg" TextGrid output is produced.
syl- whether or not word stress is to be added to the output transcription.stress- whether or not the output transcription is to be syllabified.nrm- Detects and expands 22 non-standard word types.com- whether <*> strings should be treated as annotation markers. If true, then strings of this type are considered as annotation markers that are not processed but passed on to the output.align- "yes", "no", or "sym" decision whether or not the transcription is to be letter-aligned. Syllable boundaries and word stress are not part of the output of this 'sym' alignment.- Returns:
- The result of this call.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
MAUS
public BASResponse MAUS(String LANGUAGE, File SIGNAL, File BPF, String OUTFORMAT, String OUTSYMBOL) throws IOException, ParserConfigurationException
Invoke the general MAUS service, with mostly default options, for forced alignment given a WAV file and a phonemic transcription.- Parameters:
LANGUAGE- RFC 5646 tag for identifying the language.SIGNAL- The signal, in WAV format.BPF- Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.OUTFORMAT- Defines the output format:- "TextGrid" - a praat compatible TextGrid file
- "par" or "mau-append" - the input BPF file with a new (or replaced) tier MAU
- "csv" or "mau" - only the BPF MAU tier (CSV table)
- "legacyEMU" - a file with extension *.EMU that contains in the first part the Emu hlb file (*.hlb) and in the second part the Emu phonetic segmentation (*.phonetic)
- "emuR" - an Emu compatible *_annot.json file
OUTSYMBOL- Defines the encoding of phonetic symbols in output.- "sampa" - (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA
- "ipa" - the service produces UTF-8 IPA output.
- "manner" - the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective.
- "place" - the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
MAUS
public BASResponse MAUS(String LANGUAGE, File SIGNAL, File BPF, String OUTFORMAT, String OUTSYMBOL, Integer MINPAUSLEN, Integer STARTWORD, Integer ENDWORD, File RULESET, Integer MAUSSHIFT, Double INSPROB, Boolean INSKANTEXTGRID, Boolean INSORTTEXTGRID, Boolean USETRN, Boolean NOINITIALFINALSILENCE, Double WEIGHT, String MODUS) throws IOException, ParserConfigurationException
Invoke the general MAUS service, for forced alignment given a WAV file and a phonemic transcription.- Parameters:
LANGUAGE- RFC 5646 tag for identifying the language.SIGNAL- The signal, in WAV format.BPF- Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.MINPAUSLEN- Controls the behaviour of optional inter-word silence. If set to 1, maus will detect all inter-word silence intervals that can be found (minimum length for a silence interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word silence interval to be detected is set to n×10 msec.STARTWORD- If set to a value n>0, this option causes maus to start the segmentation with the word number n (word numbering in BPF starts with 0).ENDWORD- If set to a value n<999999, this option causes maus to end the segmentation with the word number n (word numbering in BPF starts with 0).RULESET- MAUS rule set file; UTF-8 encoded; one rule per line; two different file types defined by the extension: '*.nrul' : phonological rule set without statistical informationOUTFORMAT- Defines the output format:- "TextGrid" - a praat compatible TextGrid file
- "par" or "mau-append" - the input BPF file with a new (or replaced) tier MAU
- "csv" or "mau" - only the BPF MAU tier (CSV table)
- "legacyEMU" - a file with extension *.EMU that contains in the first part the Emu hlb file (*.hlb) and in the second part the Emu phonetic segmentation (*.phonetic)
- emuR - an Emu compatible *_annot.json file
MAUSSHIFT- If set to n, this option causes the calculated MAUS segment boundaries to be shifted by n msec (default: 10) into the future.INSPROB- The option INSPROB influences the probability of deletion of segments. It is a constant factor (a constant value added to the log likelihood score) after each segment. Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go up, thus decreasing the probability of deletions (and increasing the probability of insertions, which are rarely modelled in the rule sets).INSKANTEXTGRID- Switch to create an additional tier in the TextGrid output file with a word segmentation labelled with the canonic phonemic transcript.INSORTTEXTGRID- Switch to create an additional tier ORT in the TextGrid output file with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier)USETRN- If set to true, the service searches the input BPF for a TRN tier. The synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g. 'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the segmentation is restricted within a time range given by this TRN tier entry.OUTSYMBOL- Defines the encoding of phonetic symbols in output.- "sampa" - (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA
- "ipa" - the service produces UTF-8 IPA output.
- "manner" - the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective.
- "place" - the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
NOINITIALFINALSILENCE- Switch to suppress the automatic modeling on a leading/trailing silence interval.WEIGHT- weights the influence of the statistical pronunciation model against the acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log likelihood) before adding the score to the acoustical score within the search. Since the pronunciation model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be selected according to acoustic evidenceMODUS- Operation modus of MAUS:- "standard" (default) - the segmentation and labelling using the MAUS technique as described in Schiel ICPhS 1999.
- "align" - a forced alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF.
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
MAUS
public BASResponse MAUS(String LANGUAGE, InputStream SIGNAL, InputStream BPF, String OUTFORMAT, String OUTSYMBOL, Integer MINPAUSLEN, Integer STARTWORD, Integer ENDWORD, InputStream RULESET, Integer MAUSSHIFT, Double INSPROB, Boolean INSKANTEXTGRID, Boolean INSORTTEXTGRID, Boolean USETRN, Boolean NOINITIALFINALSILENCE, Double WEIGHT, String MODUS) throws IOException, ParserConfigurationException
Invoke the general MAUS service, for forced alignment given a WAV file and a phonemic transcription.- Parameters:
LANGUAGE- RFC 5646 tag for identifying the language.SIGNAL- The signal, in WAV format.BPF- Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.MINPAUSLEN- Controls the behaviour of optional inter-word silence. If set to 1, maus will detect all inter-word silence intervals that can be found (minimum length for a silence interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word silence interval to be detected is set to n×10 msec.STARTWORD- If set to a value n>0, this option causes maus to start the segmentation with the word number n (word numbering in BPF starts with 0).ENDWORD- If set to a value n<999999, this option causes maus to end the segmentation with the word number n (word numbering in BPF starts with 0).RULESET- MAUS rule set file; UTF-8 encoded; one rule per line; two different file types defined by the extension: '*.nrul' : phonological rule set without statistical informationOUTFORMAT- Defines the output format:- "TextGrid" - a praat compatible TextGrid file
- "par" or "mau-append" - the input BPF file with a new (or replaced) tier MAU
- "csv" or "mau" - only the BPF MAU tier (CSV table)
- "legacyEMU" - a file with extension *.EMU that contains in the first part the Emu hlb file (*.hlb) and in the second part the Emu phonetic segmentation (*.phonetic)
- emuR - an Emu compatible *_annot.json file
MAUSSHIFT- If set to n, this option causes the calculated MAUS segment boundaries to be shifted by n msec (default: 10) into the future.INSPROB- The option INSPROB influences the probability of deletion of segments. It is a constant factor (a constant value added to the log likelihood score) after each segment. Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go up, thus decreasing the probability of deletions (and increasing the probability of insertions, which are rarely modelled in the rule sets).INSKANTEXTGRID- Switch to create an additional tier in the TextGrid output file with a word segmentation labelled with the canonic phonemic transcript.INSORTTEXTGRID- Switch to create an additional tier ORT in the TextGrid output file with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier)USETRN- If set to true, the service searches the input BPF for a TRN tier. The synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g. 'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the segmentation is restricted within a time range given by this TRN tier entry.OUTSYMBOL- Defines the encoding of phonetic symbols in output.- "sampa" - (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA
- "ipa" - the service produces UTF-8 IPA output.
- "manner" - the service produces IPA manner of articulation for each segment; possible values are: silence, vowel, diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective.
- "place" - the service produces IPA place of articulation for each segment; possible values are: silence, labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central, back.
NOINITIALFINALSILENCE- Switch to suppress the automatic modeling on a leading/trailing silence interval.WEIGHT- weights the influence of the statistical pronunciation model against the acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log likelihood) before adding the score to the acoustical score within the search. Since the pronunciation model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be selected according to acoustic evidenceMODUS- Operation modus of MAUS:- "standard" (default) - the segmentation and labelling using the MAUS technique as described in Schiel ICPhS 1999.
- "align" - a forced alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF.
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
Pho2Syl
public BASResponse Pho2Syl(String lng, File i, String tier, Boolean wsync, String oform, Integer rate) throws IOException, ParserConfigurationException
Invoke the Pho2Syl service to syllabify a phonemic transcription.- Parameters:
lng- RFC 5646 tag for identifying the language.i- Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.tier- Name of tier in the annotation file, whose content is to be syllabified.wsync- Whether each word boundary is considered as syllable boundary.oform- Output format:- "bpf" - BAS Partiture format
- "tg" - TextGrid format
rate- Only needed if oform = "tg" (TextGrid); Sample rate to convert sample values from BAS partiture file to seconds in TextGrid.- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
Pho2Syl
public BASResponse Pho2Syl(String lng, InputStream i, String tier, Boolean wsync, String oform, Integer rate) throws IOException, ParserConfigurationException
Invoke the Pho2Syl service to syllabify a phonemic transcription.- Parameters:
lng- RFC 5646 tag for identifying the language.i- Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier.tier- Name of tier in the annotation file, whose content is to be syllabified.wsync- Whether each word boundary is considered as syllable boundary.oform- Output format:- "bpf" - BAS Partiture format
- "tg" - TextGrid format
rate- Only needed if oform = "tg" (TextGrid); Sample rate to convert sample values from BAS partiture file to seconds in TextGrid.- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
TTS
public BASResponse TTS(String INPUT_TEXT) throws IOException, ParserConfigurationException
Convenience method to invoke the MaryTTS German Text-to-speech service with plain text input, with a WAV file as output, using the default voice.- Parameters:
INPUT_TEXT- The text input.- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
TTS
public BASResponse TTS(String INPUT_TYPE, String INPUT_TEXT, String OUTPUT_TYPE, String AUDIO, String VOICE) throws IOException, ParserConfigurationException
Invoke the MaryTTS German Text-to-speech service.- Parameters:
INPUT_TYPE- One of:- "TEXT"
- "SIMPLEPHONEMES"
- "SABLE"
- "SSML"
- "APML"
- "PHONEMES"
- "INTONATION"
- "ACOUSTPARAMS"
- "RAWMARYXML"
- "TOKENS"
- "WORDS"
- "ALLOPHONES"
- "REALISED_ACOUSTPARAMS"
- "REALISED_DURATIONS"
- "PRAAT_TEXTGRID"
- "PARTSOFSPEECH"
INPUT_TEXT- The text input.OUTPUT_TYPE- One of:- "PHONEMES"
- "INTONATION"
- "ACOUSTPARAMS"
- "RAWMARYXML"
- "TOKENS"
- "WORDS"
- "ALLOPHONES"
- "REALISED_ACOUSTPARAMS"
- "REALISED_DURATIONS"
- "PRAAT_TEXTGRID"
- "PARTSOFSPEECH"
- "AUDIO"
- "HALFPHONE_TARGETFEATURES"
AUDIO- If OUTPUT_TYPE = "AUDIO", this can be one of:- "WAVE_FILE"
- "AU_FILE"
- "AIFF_FILE"
VOICE- One of:- "bits4unitselautolabel"
- "bits3unitselautolabel"
- "bits3"
- "bits2unitselautolabel"
- "bits1unitselautolabel"
- "bits4unitselautolabelhmm"
- "bits3unitselautolabelhmm"
- "bits3-hsmm"
- "bits2unitselautolabelhmm"
- "bits1unitselautolabelhmm"
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
TTS
public BASResponse TTS(String INPUT_TYPE, File INPUT_TEXT, String OUTPUT_TYPE, String AUDIO, String VOICE) throws IOException, ParserConfigurationException
Invoke the MaryTTS German Text-to-speech service.- Parameters:
INPUT_TYPE- One of:- "TEXT"
- "SIMPLEPHONEMES"
- "SABLE"
- "SSML"
- "APML"
- "PHONEMES"
- "INTONATION"
- "ACOUSTPARAMS"
- "RAWMARYXML"
- "TOKENS"
- "WORDS"
- "ALLOPHONES"
- "REALISED_ACOUSTPARAMS"
- "REALISED_DURATIONS"
- "PRAAT_TEXTGRID"
- "PARTSOFSPEECH"
INPUT_TEXT- The text input.OUTPUT_TYPE- One of:- "PHONEMES"
- "INTONATION"
- "ACOUSTPARAMS"
- "RAWMARYXML"
- "TOKENS"
- "WORDS"
- "ALLOPHONES"
- "REALISED_ACOUSTPARAMS"
- "REALISED_DURATIONS"
- "PRAAT_TEXTGRID"
- "PARTSOFSPEECH"
- "AUDIO"
- "HALFPHONE_TARGETFEATURES"
AUDIO- If OUTPUT_TYPE = "AUDIO", this can be one of:- "WAVE_FILE"
- "AU_FILE"
- "AIFF_FILE"
VOICE- One of:- "bits4unitselautolabel"
- "bits3unitselautolabel"
- "bits3"
- "bits2unitselautolabel"
- "bits1unitselautolabel"
- "bits4unitselautolabelhmm"
- "bits3unitselautolabelhmm"
- "bits3-hsmm"
- "bits2unitselautolabelhmm"
- "bits1unitselautolabelhmm"
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
TTS
public BASResponse TTS(String INPUT_TYPE, InputStream INPUT_TEXT, String OUTPUT_TYPE, String AUDIO, String VOICE) throws IOException, ParserConfigurationException
Invoke the MaryTTS German Text-to-speech service.- Parameters:
INPUT_TYPE- One of:- "TEXT"
- "SIMPLEPHONEMES"
- "SABLE"
- "SSML"
- "APML"
- "PHONEMES"
- "INTONATION"
- "ACOUSTPARAMS"
- "RAWMARYXML"
- "TOKENS"
- "WORDS"
- "ALLOPHONES"
- "REALISED_ACOUSTPARAMS"
- "REALISED_DURATIONS"
- "PRAAT_TEXTGRID"
- "PARTSOFSPEECH"
INPUT_TEXT- The text input.OUTPUT_TYPE- One of:- "PHONEMES"
- "INTONATION"
- "ACOUSTPARAMS"
- "RAWMARYXML"
- "TOKENS"
- "WORDS"
- "ALLOPHONES"
- "REALISED_ACOUSTPARAMS"
- "REALISED_DURATIONS"
- "PRAAT_TEXTGRID"
- "PARTSOFSPEECH"
- "AUDIO"
- "HALFPHONE_TARGETFEATURES"
AUDIO- If OUTPUT_TYPE = "AUDIO", this can be one of:- "WAVE_FILE"
- "AU_FILE"
- "AIFF_FILE"
VOICE- One of:- "bits4unitselautolabel"
- "bits3unitselautolabel"
- "bits3"
- "bits2unitselautolabel"
- "bits1unitselautolabel"
- "bits4unitselautolabelhmm"
- "bits3unitselautolabelhmm"
- "bits3-hsmm"
- "bits2unitselautolabelhmm"
- "bits1unitselautolabelhmm"
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be configured.
-
TextAlign
public BASResponse TextAlign(File i, String cost) throws IOException, ParserConfigurationException
Convenience method to invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription, using no cost file, and default options fordisplcanddir.- Parameters:
i- CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s.cost- Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment.- "naive" - assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x').
- "g2p_deu", "g2p_eng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3.
- "intrinsic" - a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function.
- "import" - the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'.
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be
-
TextAlign
public BASResponse TextAlign(File i, String cost, File costfile, Boolean displc, String atype) throws IOException, ParserConfigurationException
Invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription.- Parameters:
i- CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s.cost- Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment.- "naive" - assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x').
- "g2p_deu", "g2p_eng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3.
- "intrinsic" - a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function.
- "import" - the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'.
costfile- CSV text file with three semicolon-separated columns. Each row contains three columns of the form a;b;c, where c denotes the cost for substituting a by b. Insertion and deletion are are marked by an underscore.displc- whether alignment costs should be displayed in a third column in the output file.atype- Alignment type:- "dir" - align the second column to the first.
- "sym" symmetric alignment.
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be
-
TextAlign
public BASResponse TextAlign(InputStream i, String cost, InputStream costfile, Boolean displc, String atype) throws IOException, ParserConfigurationException
Invoke the TextAlign service for aligning two representations of text, e.g. letters in orthographic transcript with phonemes in a phonemic transcription.- Parameters:
i- CSV text file with two semicolon-separated columns. Each row contains a sequence pair to be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical transcription like S c h e r z;S E6 t s.cost- Cost function for the edit operations substitution, deletion, and insertion to be used for the alignment.- "naive" - assigns cost 1 to all operations except of null-substitution, i.e. the substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x').
- "g2p_deu", "g2p_eng" etc. are predefined cost functions for grapheme-phoneme alignment for the respective language expressed as iso639-3.
- "intrinsic" - a cost function is trained on the input data and returned in the output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the more reliable the emerging cost function.
- "import" - the user can provide his/her own cost function file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs 0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to subsequently apply this cost function on smaller data sets with cost='import'.
costfile- CSV text file with three semicolon-separated columns. Each row contains three columns of the form a;b;c, where c denotes the cost for substituting a by b. Insertion and deletion are are marked by an underscore.displc- whether alignment costs should be displayed in a third column in the output file.atype- Alignment type:- "dir" - align the second column to the first.
- "sym" symmetric alignment.
- Returns:
- The response to the request.
- Throws:
IOException- If an IO error occurs.ParserConfigurationException- If the XML parser for parsing the response could not be
-
-