Two tiny models for Named Entity Recognition
Our latest models are available now on HuggingFace and Minibase.
TL;DR: We’re releasing compact models for Named Entity Recognition (NER). These model can run locally on a CPU and quickly identifies people, organizations, and locations with near-perfect recall. There is a Standard and Small version.
Both models are available on HuggingFace (Standard & Small) or Minibase.ai for fine-tuning or API calls.
NER models extract names and places from text. They’re used in search engines, finance systems, and research pipelines to turn unstructured text into data. Most existing models to do this task are large or slow, however, and so we trained small models (both sub-400 megabytes) that run locally and output structured JSON. Each model took about an hour to fine-tune on Minibase, with zero code.
We evaluated each model by precision, recall, and F1 score — how many entities the model finds and how accurate they are. For the Standard model, which is 369MB in size, those metrics are:
Precision: 91.5%
Recall: 100%
F1 Score: 95.1%
Latency: 323 ms
For the Small model, which is 143 MB, those metrics are:
Precision: 63%
Recall: 34.3%
F1 Score: 43.5%
Latency: 76.6 ms
The Standard model, clearly, is much more accurate. We recommend using it over the Small variant.
Examples
Input:
Microsoft Corporation announced that Satya Nadella will visit London next week.
Output:
{”PER”: [”Satya Nadella”], “ORG”: [”Microsoft Corporation”], “LOC”: [”London”]}
Input:
The University of Cambridge is located in the United Kingdom and was founded by King Henry III.
Output:
{”PER”: [”King Henry III”], “ORG”: [”University of Cambridge”], “LOC”: [”United Kingdom”]}
Input:
John Smith works at Google in New York and uses Python programming language.
Output:
{”PER”: [”John Smith”], “ORG”: [”Google”], “LOC”: [”New York”], “MISC”: [”Python”]}
Like all Minibase models, these are being released under Apache 2.0. You can download either model and use it for free. To share results or feedback, join the Minibase Discord.