SHIELD: The Universal Framework Making AI Search Safer for Everyone

Pavel Kordik

Jun 24, 2025

Presented at UMAP 2025 in New York City

Imagine searching for "glass tubing" and getting recommendations for drug manufacturing equipment. As AI-powered search becomes ubiquitous — from online marketplaces to social networks — the stakes for getting it wrong have never been higher.

The Rise of Natural Language Expectations

The success of ChatGPT and similar conversational AI has fundamentally changed user behavior. People have learned that machines can understand natural language, and they're applying this expectation everywhere. Instead of typing "red shoes size 8," users now search with complex, conversational queries like "comfortable red shoes for my evening work commute that won't hurt after 10 hours."

Companies like Linked-in are convincing users that they can use natural language queries in their digital products.

This shift is forcing online services to rapidly enhance their capabilities. Traditional keyword-based search feels primitive when users expect semantic understanding. Companies are turning to advanced solutions like Recombee's semantic search to meet these new expectations, enabling their platforms to interpret intent rather than just match words.

But here's the challenge: when your catalog contains sensitive or potentially harmful items — from chemicals and tools to adult content — natural language queries can lead users (and your recommendation algorithms) into dangerous territory. A conversational query might seem innocent but actually be seeking something problematic.

This is where alignment becomes critical. Our research proposes a systematic approach to ensure that semantic search systems understand not just what users are asking for, but whether they should be helped to find it.

Presenting SHIELD at UMAP 2025, a paper by Filip Spacek, Vojtech Vancura and Pavel Kordik

That's why we built SHIELD: a framework that teaches AI systems to recognize harmful, sensitive, and safe queries before they cause damage.

The Hidden Dangers of Semantic Search

Semantic search has transformed digital interactions by understanding natural language rather than relying on exact keywords. This makes it powerful for guiding users to relevant content — but also risky. In environments like marketplaces or social networks, ambiguous or malicious queries can trigger harmful or offensive recommendations. Over-filtering sensitive content may alienate users seeking it intentionally, while under-filtering risks exposing others to unsafe material. Without safeguards, even well-meaning AI systems can inadvertently cause harm.

What Makes SHIELD Different

SHIELD (Semantic Harmful-content Identification and Ethical Labeling Dataset) isn't just another content moderation tool — it's a complete methodology for building ethical AI systems from the ground up.

SHIELD is not a static dataset but rather a methodology for constructing and curating labeled data for training classification models. The methodology involves three key stages:

1. Hierarchical Query Generation

The process begins by defining a hierarchical structure of content types relevant to the intended application. For instance, this may involve:

A set of broad categories (e.g., safety violations, legal issues, misinformation)
A corresponding set of fine-grained subcategories

Next, for each subcategory, large language models are used to generate realistic, diverse query examples. Typically, a fixed number of candidate queries are produced per subcategory (e.g., 20 per subcategory), ensuring sufficient breadth and coverage.

2. Quality Scoring and Filtering with a Reward Model

Not all generated queries are equally informative or representative. To refine the dataset:

Each query is evaluated using a reward model (such as Skywork-Reward), which assigns a relevance or quality score.
Based on these scores, only the top-ranked examples are retained for downstream use.

This filtering step ensures:

Higher semantic clarity and intent precision
Better separation between target classes
Reduced labeling noise in the final dataset

Importantly, the exact number of categories and classes can vary between implementations. In one use case, a three-class system (safe, sensitive, harmful) was used — but the SHIELD methodology can easily be extended to more classes based on the application’s ethical and operational needs.

3. Training and Deployment

The curated dataset is then used to train AI models capable of classifying incoming queries based on learned semantic and ethical signals. These models can be integrated into moderation pipelines, chatbot backends, or safety layers for various applications.

Use Case: Moderating User Input in Online Marketplaces

To evaluate SHIELD in a real-world context, we applied the methodology to themoderation of user-generated queries in an online marketplace — a setting where misuse can take many forms: scams, illegal activity, or inappropriate communication.

Example of SHIELD powered detection in online marketplace environment

The dataset used for this evaluation was built through structured generation and then filtered using a reward model to retain only high-confidence examples. After this filtering process, the final dataset comprised 17,170 queries classified as harmful and 8,871 as sensitive, selected from an initially much larger set of generated examples. These queries reflect realistic search behavior in marketplace contexts and are cleanly separated into well-aligned classes suitable for model training.

Using this dataset, three classification approaches were tested. The first, based on BM25 keyword similarity, achieved 93.2% accuracy, offering a computationally light yet reasonably effective solution. The second approach, which employed semantic embeddings with FAISS for nearest neighbor classification, improved performance to 96.5%, demonstrating the benefit of deeper contextual understanding. The highest accuracy — 98.4% with a 98.6% F1-score — was obtained using MoralBERT, a fine-tuned transformer model trained directly on the SHIELD dataset. This method provided the most robust generalization, especially in handling subtle or adversarial queries, though it came with higher computational costs.

These results demonstrate that SHIELD can provide a strong foundation for query-level content moderation in semantic search systems, particularly when dealing with complex ethical boundaries in commercial platforms.

See more in our research paper [1].

Extending SHIELD to Other Domains: Ideas and Examples

The strength of SHIELD lies in its flexibility — its modular pipeline for query generation, filtering, and classification can be adapted to many real-world applications. Below are several domains where SHIELD can be used, along with examples of potential category structures and alignment objectives for each.

Applications Beyond Marketplaces: Domain-Specific Examples

🏥 Healthcare & Wellness
SHIELD can support chatbots and digital health tools by identifying ethically risky queries such as unsafe medical advice (e.g., self-medicating, harmful home remedies), misdiagnosis risks (ambiguous or misleading symptoms), mental health crises (suicidal ideation, self-harm), and misinformation (vaccine myths, dietary pseudoscience). The goal is to flag or escalate such queries for human oversight when necessary.

🏛️ Government and Civic Tech
Civic platforms benefit from filtering legally ambiguous queries (e.g., exploiting legal loopholes), threats, hate speech, and manipulative content (e.g., conspiracy-laden inputs). SHIELD can promote civil discourse and policy compliance while maintaining trust in public services.

🎓 Educational Technology
AI tutors can detect academic dishonesty (e.g., cheating requests), cyberbullying, and inappropriate content (e.g., explicit jokes). SHIELD helps reinforce ethical learning behavior and safe classroom interactions.

💼 Enterprise Security & Compliance
For internal tools, SHIELD can flag policy violations (e.g., bypass attempts, leaking sensitive data), toxic communication (e.g., harassment), and compliance risks (e.g., GDPR misuse). These capabilities aid in early intervention and internal risk management.

🧠 Mental Health & Wellbeing
Self-help apps can use SHIELD to identify crisis signals (suicidal language, hopelessness), interpersonal distress (abuse, trauma), and harmful coping patterns (substance abuse, isolation), enabling appropriate referrals and support.

📰 Media and News Moderation
Comment sections can be moderated by detecting inflammatory language (e.g., incitement, hate speech), disinformation (false claims, conspiracies), and manipulation tactics (deep fakes, revisionism). SHIELD can help maintain healthy discourse and safeguard editorial integrity.

SHIELD enables developers to design domain-specific query classifiers by simply adjusting:

The taxonomic structure (categories and subcategories)
The query generation prompts
The reward model filtering criteria
The downstream model training and deployment strategy

This adaptability means SHIELD can serve as a building block for creating AI systems that understand and enforce ethical boundaries — whether in education, healthcare, public policy, or enterprise.

Beyond Content Moderation: Building Trust

SHIELD represents a fundamental shift in how we think about AI safety. Rather than reactive content filtering, it enables proactive alignment — training systems to understand ethical boundaries before they encounter real users.

This approach builds something even more valuable than safety: trust. When users know an AI system has been designed with their wellbeing in mind, adoption increases and satisfaction soars.

Open Source, Open Model, Ready to Deploy

The entire SHIELD framework is available for immediate use:

📦 Dataset & Code: github.com/flpspacek/SHIELD
🤖 Pre-trained Model: huggingface.co/spacefi1/moralBERT

Whether you're building the next generation of search engines, virtual assistants, or specialized AI tools, SHIELD provides the ethical foundation your users deserve.

The Bottom Line

Every AI system that interacts with users in natural language needs ethical guardrails. SHIELD makes implementing those guardrails not just possible, but practical and scalable.

The question isn't whether you need content moderation for your AI system—it's whether you can afford to deploy without it.

Ready to build safer AI? Start with SHIELD.

This research was supported by the FIT CTU Student Research Program, Recombee, and the PoliRuralPlus project (EU Grant 101136910).

References

[1] F. Spacek, V. Vancura, and P. Kordik, “Mitigating Risks in Marketplace Semantic Search: A Dataset for Harmful and Sensitive Query Alignment,” in Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, 2025, pp. 329–334. doi: 10.1145/3699682.3728329.

SHIELD: The Universal Framework Making AI Search Safer for Everyone

The Rise of Natural Language Expectations

The Hidden Dangers of Semantic Search

What Makes SHIELD Different

1. Hierarchical Query Generation

2. Quality Scoring and Filtering with a Reward Model

3. Training and Deployment

Use Case: Moderating User Input in Online Marketplaces

Extending SHIELD to Other Domains: Ideas and Examples

Applications Beyond Marketplaces: Domain-Specific Examples

Beyond Content Moderation: Building Trust

Open Source, Open Model, Ready to Deploy

The Bottom Line

References

Next Articles

How Regionalization-Based Recommendations Can Improve Your Operations

Making Recommendations Fairer: A New Way to Guarantee Exposure for All

2025 Sneak Peek

Try the World’s Best Recommender Engine
Free for 30 Days

SHIELD: The Universal Framework Making AI Search Safer for Everyone

The Rise of Natural Language Expectations

The Hidden Dangers of Semantic Search

What Makes SHIELD Different

1. Hierarchical Query Generation

2. Quality Scoring and Filtering with a Reward Model

3. Training and Deployment

Use Case: Moderating User Input in Online Marketplaces

Extending SHIELD to Other Domains: Ideas and Examples

Applications Beyond Marketplaces: Domain-Specific Examples

Beyond Content Moderation: Building Trust

Open Source, Open Model, Ready to Deploy

The Bottom Line

References

Next Articles

How Regionalization-Based Recommendations Can Improve Your Operations

Making Recommendations Fairer: A New Way to Guarantee Exposure for All

2025 Sneak Peek

Try the World’s Best Recommender Engine Free for 30 Days

Try the World’s Best Recommender Engine
Free for 30 Days