Safe Harbor De-Identification
Six weeks of legal review.
Or one command.
You need production data for realistic testing. Legal needs a six-week review to approve the data copy request. The test deadline is Friday. You have been here before. This time, skip the request entirely.
The compliance bottleneck that kills testing timelines.
Every integration team hits the same wall. Staging needs realistic data. Realistic data means production messages. Production messages contain PHI. PHI requires legal review, compliance sign-off, data use agreements, and a chain of custody you will be auditing for years.
So you compromise. You hand-build 14 test patients named John Doe. You copy a production ADT, open it in a text editor, and start replacing names and MRNs by hand. You are careful. You catch the PID. You catch the NK1. But you miss the SSN buried in an NTE segment on message 147.
Now you have a reportable PHI breach because your scrubbing was Find-and-Replace in Notepad++. The problem is not carelessness. The problem is that manual de-identification does not scale, and production copy-and-scrub treats a compliance requirement as a text editing exercise.
All 18 identifier categories. Every time.
One missed SSN in a free-text NTE segment is a reportable breach. Regex misses it. Pidgeon parses the full HL7 abstract syntax tree and removes every structural identifier with certainty — not guesswork.
Deterministic hashing means the same input with the same salt produces the same output. Your team gets identical de-identified datasets. Reproducible. Auditable.
De-identify production messages.
When you need the real production payload — the message with the Z-segment your mapper has never seen, the OBX with the non-standard reference range — Post strips every HIPAA identifier locally. Zero cloud extraction. Zero data transmission. The messages never leave your machine.
$ pidgeon deident --in ./prod_samples --out ./safe_samples --date-shift 90d --salt "project-2026"
Processing 847 messages...
18 HIPAA identifier categories detected and replaced
Date shifting applied (+90 days) to all temporal fields
Cross-message referential integrity preservedZero transmission
Your PHI is not uploaded to a cloud service. It never touches our servers. It never crosses your network boundary. Your CISO can verify this in the first meeting.
Deterministic hashing
Same input, same salt, same output. Share the salt with your team and everyone gets identical de-identified datasets across every run.
Referential integrity
A patient MRN replaced with a synthetic value is replaced consistently across every message in the batch. Cross-message references stay coherent.
Generate from nothing.
When you do not need the production payload and just need realistic test data, generate it from scratch. There is no PHI to de-identify because no PHI ever existed. No legal review. No compliance risk. No waiting.
$ pidgeon generate ADT^A01 --count 500 --vendor epic --output ./test_data/
Generated 500 HL7 v2.5.1 messages
Clinically correlated demographics
Vendor-realistic field patterns
Zero PHI by constructionNo data use agreement. No six-week legal review. No chain of custody. The data was never real.
Prove it to compliance.
Post generates a compliance report based on actual detection results. Hand it to your privacy officer. Attach it to the data use agreement. The report documents every identifier category scanned, every substitution made, and every field preserved.
$ pidgeon deident --in ./prod_samples --out ./safe --date-shift 90d --report compliance.htmlAudit-ready documentation
The report documents every identifier category scanned, every substitution made, and every field preserved. Attach it directly to your data use agreement.
Actual detection results
Not a policy document. An evidence document. The compliance report reflects what Pidgeon actually found and removed from your specific dataset.
The conversation with legal changes.
Before
“We need production data for testing.”
“File a data use agreement. We will review in six weeks.”
After
“The test data was generated synthetically. No PHI was involved at any stage. Here is the compliance report.”
There is nothing to scrub, nothing to approve, nothing to breach.
For QA and test data managers
This is the workflow that removes legal from the testing critical path. De-identify when you need production structure. Generate when you need volume. Either way, your test deadline is no longer blocked by a compliance review.
For integration engineers
The free CLI includes full de-identification. No Pro tier required. Point it at a directory and have safe test data before lunch.
De-identification is free. Right now.
The full Safe Harbor workflow — all 18 HIPAA identifier categories, deterministic hashing, date shifting, referential integrity — ships with the free CLI. No trial. No subscription. No strings.
Download the CLI (Mac / Windows / Linux)De-identification is free. Right now.
The full Safe Harbor workflow ships with the free CLI — all 18 HIPAA identifier categories, deterministic hashing, date shifting, and referential integrity. No trial, no subscription, no strings.