loadLexicon.Rd
By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file). The file must have a 'flat' structure in the sense that it's a simple list of dictionary entries with a fixed number of columns/fields, rather than having a complex structure.
loadLexicon(
labbcat.url,
file,
lexicon,
field.delimiter,
field.names,
quote = "",
comment = "",
skip.first.line = FALSE,
no.progress = FALSE
)
URL to the LaBB-CAT instance.
The full path name of the lexicon file.
The name for the resulting lexicon. If the named lexicon already exists, it will be completely replaced with the contents of the file (i.e. all existing entries will be deleted befor adding new entries from the file). e.g. 'cmudict'
The character used to delimit fields in the file. If this is " - ", rows are split on only the <em>first</em> space, in line with common dictionary formats. e.g. ',' for Comma Separated Values (CSV) files.
A list of field names, delimited by field.delimiter, e.g. 'Word,Pronunciation'.
The character used to quote field values (if any), e.g. '"'.
The character used to indicate a line is a comment (not an entry) (if any) e.g. '#'.
Whether to ignore the first line of the file (because it contains field names).
TRUE to supress visual progress bar. Otherwise, progress bar will be shown when interactive().
An error message, or NULL if the upload was successful.
This function uploads such a lexicon file, for use in tagging tokens.
You must have editing privileges in LaBB-CAT in order to be able to use this function.
if (FALSE) {
## Upload the CMU Pronouncing Dictionary
loadLexicon(labbcat.url, "cmudict", " - ", "", ";", "Word - Pron", FALSE, "cmudict.txt")
}