feat(guardrails): add EncodedPayloadScanner for obfuscated injection detection#51
Open
u7k4rs6 wants to merge 1 commit into
Open
feat(guardrails): add EncodedPayloadScanner for obfuscated injection detection#51u7k4rs6 wants to merge 1 commit into
u7k4rs6 wants to merge 1 commit into
Conversation
…detection Closes future-agi#50. Adds a new scanner that detects base64, hex, percent-encoded, and unicode/hex-escape blobs, decodes them, and rescans the decoded text for prompt-injection markers. Only blobs that decode to injection content (confidence 0.9) cross the block threshold; benign encoded data (hashes, tokens, image fragments) passes cleanly. - New file: python/fi/evals/guardrails/scanners/encoded_payload.py - Registered as "encoded_payload" via @register_scanner - Exported from __init__.py, added to __all__ - Wired into create_default_pipeline(encoded_payload=False) (off by default, same policy as urls and invisible_chars) - 6 new tests in TestEncodedPayloadScanner covering b64/hex/percent injection detection and benign b64, hex hash, and clean-text passes (all green)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #50.
Summary
EncodedPayloadScanner— a local, dependency-free scanner that detects base64, hex, percent-encoded, and unicode/hex-escape blobs, decodes them, then rescans the decoded text for prompt-injection markers."encoded_payload"via@register_scanner, exported from__init__.pyand added to__all__, wired intocreate_default_pipeline(encoded_payload=False)— off by default, same policy asurlsandinvisible_chars.Motivation
Keyword/regex-based scanners all operate on raw text. An adversary can bypass every one of them by base64-encoding an injection string:
decodes to
ignore all previous instructionsand passesJailbreakScanner,CodeInjectionScanner, andSecretsScannerunchanged.EncodedPayloadScannercloses this class of bypass.Changes
python/fi/evals/guardrails/scanners/encoded_payload.pypython/fi/evals/guardrails/scanners/__init__.py__all__,create_default_pipelineparampython/tests/sdk/test_guardrails_scanners.pyTestEncodedPayloadScanner— 6 casesTest plan
cd python uv sync --dev uv run pytest tests/sdk/test_guardrails_scanners.py::TestEncodedPayloadScanner -vExpected: