Presidio - Data Protection and Anonymization API
Context aware, pluggable and customizable PII anonymization service for text and images.
What is Presidio
Presidio (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive text is properly managed and governed. It provides fast analytics and anonymization for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context. Presidio leverages docker and kubernetes for workloads at scale.
Presidio can be integrated into any data pipeline for intelligent PII scrubbing. It is open-source, transparent and scalable. Additionally, PII anonymization use-cases often require a different set of PII entities to be detected, some of which are domain or business specific. Presidio allows you to customize or add new PII recognizers via API or code to best fit your anonymization needs.
Demo
Try Presidio with your own data
❗ Presidio V2 is coming!
In the next few weeks, we will start migrating the Presidio repo from the current version to a new version.
The main changes from V1 to V2 are:
- Replacing gRPC with HTTP to allow more customizable APIs and easier debugging.
- The anonymizer service will be Python based and pip installable.
- Focus on the analyzer and anonymizer service. Other services will be deprecated at first and potentially migrated over time to V2 with the help of the community.
- Better documentation, samples and build flows.
Notes:
- The current V1 code base will continue to be available but will no longer be officially supported.
- We will maintain backward compatibility with the current Presidio API for text analysis and anonymization.
We are certain this change will make Presidio much more accessible, maintainable and customizable and we look forward to collaborating with this great community!
Stay tuned!
Overview
Presidio API
API Spec - available APIs, request and response formats.
Presidio REST API Open API Spec
API Samples
- Simple Text Analysis
- Create Reusable Templates
- Detect Specific Entities
- Custom Anonymization
- Add Custom PII Entity Recognizer
- Image Anonymization
Learn more
More information can be found in Presidio Documentation
- Supported field types
- Database and storage scanner
- Adding new PII recognizers
- Generating Swagger file
- Evaluating Presidio
- Proto packages for Presidio API
Deploying Presidio on a Kubernetes Cluster
Follow the Deployment Guidelines for details:
- Single click deployment on a Kubernetes Cluster
- Step by Step Deployment with customizable parameters on a Kubernetes Cluster
Developing Presidio
- Setting Up a Development Environment
- Adding Custom Fields
- Recognizers Development - Best Practices and Considerations
- Using the Analyzer Service
- Calling the different services
- Connector Developer Guide
Deploy Presidio for Test and Dev
- Deploy locally using Docker
- Deploy locally using KIND
- Presidio-Analyzer as a standalone python package
Current input/output components status
| Module | Feature | Status |
|---|---|---|
| API | HTTP input | |
| Scanner | MySQL | |
| Scanner | MSSQL | |
| Scanner | PostgreSQL | |
| Scanner | Oracle | |
| Scanner | Azure Blob Storage | |
| Scanner | S3 | |
| Scanner | Google Cloud Storage | |
| Streams | Kafka | |
| Streams | Azure Event Hub | |
| Datasink (output) | MySQL | |
| Datasink (output) | MSSQL | |
| Datasink (output) | Oracle | |
| Datasink (output) | PostgreSQL | |
| Datasink (output) | Kafka | |
| Datasink (output) | Azure Event Hub | |
| Datasink (output) | Azure Blob Storage | |
| Datasink (output) | S3 | |
| Datasink (output) | Google Cloud Storage |
✅ - Working🔶 - Partially supported (alpha)❌ - Not supported yet
How to contact us?
If you have a usage question, found a bug or have a suggestion for improvement, please file a Github issue. For other matters, please email presidio@microsoft.com
Contributing
For details on contributing to this repository, see the contributing guide.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.