Named Entity Recognition
Depending on the language of your transcripts, you may be able to automatically tag dates and names of people, places and organisations, using the Stanford Named Entity Recognizer (NER).
The Stanford NER has recognizers for:
- Arabic
- Chinese
- English
- French
- German
- Hungarian
- Italian
- Spanish
| Entities: | PERSON | PERSON | LOCATION | DATE | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Words: | President | Barack | Obama | was | born | in | Hawaii. | He | was | elected | in | 2008. |
Several classifiers for English are included in the default installation. You can download classifiers for other languages from the Stanford CoreNLP site.
Different classifiers include different possible entity labels:
- 3 class : LOCATION, PERSON, ORGANIZATION
- 4 class : LOCATION, PERSON, ORGANIZATION, MISC
- 7 class : LOCATION, PERSON, ORGANIZATION, MONEY, PERCENT, DATE, TIME
The steps for Named Entity tagging your corpus are:
- Install the Layer Manager
- Configure a Named Entity layer
Install the Layer Manager
- In LaBB-CAT, select the layer managers option on the menu at the top.
- At the bottom, follow the link labelled: List of layer managers that are not yet installed.
- Find the StanfordNERecognizer layer manager in the list, and press its Install button, then Install again.
You will see a configuration page with some information about the tagger, and an option to upload a tagger file, which we don’t need to do. - Press Configure.
You will see a progress bar while the layer manager downloads the Stanford POS Tagger files.
Once it’s finished, you’ll see a further information page.
Create a Named Entity layer
Now the layer manager is installed, we need to create a layer that is configured to use it to tag words that are names of entitities…
- Select word layers on the menu at the top.
- You will see a list of word tag layers that have already been configured. The column headings at the top are also a form for creating a new layer, so we’ll fill in that form now.
- Fill in the following details on the form at the top:
- Layer ID:
namedEntity1 - Type: Text
- Alignment: None
- Manager: Stanford Named Entity Recognizer
- Generate: Always
- Project: This can be left as the default value, unless you want to add the layer to a category of your choice.
- Layer ID:
- Press New
You will see the layer configuration form. Mostly you can leave the default values as they are. - By default, the Classifier to use options are all for English data; you can choose whether 3, 4, or 7 types of entities are tagged.
If your data is not in English, you can download a recognizer for your language from the Stanford CoreNLP website
To install a classifier:
- Download the corresponding files from the CoreNLP website
- In the layer configuration form, set Classifier to use to
[other classifier]
You will be asked to select a file. - Select the file you just downloaded.
After a short delay while the file is uploaded, you’ll see a message saying “Classifier(s) installed.” - In the layer configuration form, you can now set Classifier to use to the classifier you just installed.
- Press Set Parameters.
- Now press Regenerate to run the Named Entity Recognizer on your whole corpus.
You will see a progress bar while the transcripts are being tagged. - Once it’s complete, select the transcripts option on the menu, and click the first transcript in the list.
- Tick the new namedEntity layer to display the tags.
You will see that some words have a tag above them - these identify the type of named entity found by the classifier.

Footnotes
There may already a layer called
entityin your LaBB-CAT configuration under phrase layers, so if you want to call this layerentity, you will have to delete the phrase layer first.↩︎


to save your changes.

