GLiNER: A Generalist Model for Named Entity Recognition using Bidirectional Transformers

Nov 2, 2024 4 min read NLP, Named Entity Recognition, Deep Learning

1. What is Named Entity Recognition (NER)?

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that involves identifying and classifying spans of text that refer to real-world entities such as:

Persons (e.g., “Albert Einstein”),
Organizations (e.g., “United Nations”),
Locations (e.g., “Paris”),
Dates, Products, Diseases, and many more.

Traditional NER systems are trained on a fixed set of entity types, which limits their adaptability to new domains or tasks. Recently, Open NER has emerged as a flexible paradigm that allows recognizing arbitrary entity types based on natural language instructions — a direction GLiNER directly embraces and enhances.

2. Overview

GLiNER is a compact and general-purpose model for Named Entity Recognition (NER) that leverages Bidirectional Transformers (like BERT or DeBERTa) to extract arbitrary types of entities from text — without being constrained to a fixed label set. Unlike traditional NER models or large language models (LLMs) like ChatGPT, GLiNER is lightweight, efficient, and designed for zero-shot generalization across domains and languages.

Traditional NER systems are limited by a fixed ontology of entity types. While LLMs (e.g., GPT-3, ChatGPT) allow open-type NER via prompting, they are computationally expensive, slow (token-by-token decoding), and often impractical in production due to API cost and latency. GLiNER aims to:

Retain the flexibility of LLMs in handling arbitrary entity types.
Achieve high performance with orders of magnitude fewer parameters.
Enable parallel extraction of entities rather than autoregressive generation.

3. Model Architecture

GLiNER reframes NER as a semantic matching problem between entity types and text spans in a shared latent space.

Input Format:

[ENT] person [ENT] organization [ENT] location [SEP] Text...
- `[ENT]`: special token preceding each entity type.
- `[SEP]`: separates entity types from input text.

Bidirectional Encoder: A BiLM (e.g., DeBERTa-v3) encodes both entity types and the input text.
Span Representation Module: Computes span embeddings from token representations using a feedforward network: \[ S_{ij} = \text{FFN}(h_i \oplus h_j) \]
Entity Representation Module: Processes entity type embeddings via another FFN.
Matching Layer: Calculates matching score: \[ \phi(i, j, t) = \sigma(S_{ij}^T q_t) \] where $\sigma$ is the sigmoid function.
Training Strategy
- Objective: Binary cross-entropy loss over span/type pairs.
- Data Source: Trained on Pile-NER, a dataset derived from The Pile corpus with 44.8k passages and 13k entity types. Labels were generated by ChatGPT, acting as a teacher model (data-level distillation).
  
  Prompting ChatGPT for entity extraction in Pile-NER dataset
- Techniques for robustness:
- Negative sampling of entity types.
- Random shuffling and dropping of entity prompts.
- Span length cap (max 12 tokens) for efficiency.

4. Experimental Results

For zero-shot evaluation, on 20 diverse NER benchmarks and out-of-domain (OOD) tasks, GLiNER-L (0.3B) outperforms: ChatGPT, InstructUIE (11B), UniNER (13B) and even GoLLIE (7B) in most cases.
For multilingual performance without multilingual training: GLiNER-Multi (mDeBERTa) surpasses ChatGPT on 8 out of 11 languages (e.g., Spanish, German, Russian). which shows strong generalization, even on unseen scripts.
With supervised fine-tuning, after fine-tuning on labeled datasets, GLiNER competes closely with or surpasses InstructUIE, performs nearly as well as UniNER (larger LLaMA-based model). Pretraining on Pile-NER improves data efficiency, especially with small datasets.
Efficiency and Scalability
- GLiNER allows parallel inference for multiple entity types.
- Training Time: ~5 hours on a single A100 GPU for GLiNER-L.
- Parameter Sizes: 50M (S), 90M (M), 300M (L), compared to 7B–13B in baselines.

5. Ablation Insights

Component	Effect
Negative sampling (50%)	Best F1 balance
Dropping entity types	+1.4 F1 on OOD datasets
deBERTa-v3 backbone	Outperforms RoBERTa, BERT, ALBERT, ELECTRA

Earlier NER approaches include rule-based systems, sequence labeling (e.g., BiLSTM-CRF), and span classification.
LLM-based models (e.g., InstructUIE, UniNER) use instruction-tuning or generation.
GLiNER offers a middle ground: lightweight yet capable of open-type NER.

6. Conclusion

GLiNER is a generalist, scalable, and high-performing model for Named Entity Recognition that:

Bridges the gap between classic NER and large LLM-based models.
Achieves state-of-the-art zero-shot results with minimal resources.
Demonstrates robust multilingual and cross-domain generalization.

This makes it an excellent candidate for real-world NER applications in low-resource, high-efficiency environments.

🔗 Code: https://github.com/urchade/GLiNER
📄 Paper: Urchade Zaratiana et al., GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer, arXiv:2311.08526

GLiNER BERT NER llm zero-shot distillation