.png)
Vibe the Module, Not the Data
While working with the FHIR to OMOP Service, I've seen good FHIR synthetic data being created using commercial LLM's etc, custom tailored for ConditionOnset with the typical amazement on return, but witnessed some questionable trust first hand on a call. This approach also falls short generating gigantic payloads so I can go back to my interests on the backend and ensure smooth data transition.
So imposters syndrome quickly surfaced after a couple day hiatus at the 2025 OHDSI Collaborator Showcase out in New Brunswick last October, so a new approach to generating data was in order for any possibility to being invited to cocktail parties with these folks, so I leaned into the work of the pros over at Mitre Corporation that brought us Synthea.
I Immediately noticed a module for the complex Sickle Cell Disease did not exist in the modules folder in the Synthea Repo, but have always known I was afforded the opportunity to write one, but this task would be definitely need da ifferent brain that the OHDSI community seems to have in abundance, but I do not.
The Vibe
Not a huge fan of this term, but it fits the distraction for sure with lack of another term... so given that Synthea Modules generate data based on a "ConditionOnset" lets create a Sickle Cell Disease module and generate a 1m population FHIR Bulk Export from it.
{
"type": "ConditionOnset",
"target_condition": "sickle cell disease"
}
Prompt #1 - Do My Job for Me
Prompt #2 - Sure
The SCD Module
LGTM! The module that was created cited sources from the CDC almost exclusively, but here it is if you want to take a look at it, also visualized with the synthea visualization utility.
🔗 https://github.com/sween/synthea/blob/43325b191185301a668062ed0bb75a2cf1...

Run
Lets grab the generator, some associated cheat codes, load up our module, and rip the Synthetic Bulk FHIR Export to a zip file.
git clone https://github.com/synthetichealth/synthea
cd synthea
Now, lets steal @Dmitry Zasypkin 's ndjson fixer utility from his repo. This patches the generated ndjson references for processing.
https://raw.githubusercontent.com/dmitry-zasypkin/synthea-ndjson/refs/heads/main/patch-synthea-ndjsons.sh
Enable bulk fhir in the synthea.properties file.
.png)
Also helpful to only care about FHIR Resources relevant to the OMOP CDM
.png)
Then drop the generated SCD module in the modules folder.
.png)
Now run a -p 1m population synthetic generation for the State of Michigan for SCD
.png)
Somewhere in all the terminal noise and cpu fans, you should see that your module was loaded and then off to generate the ndjsons
.png)
In just under an hour, we are now run the patch-synthea-ndjsons.sh across the generated data...
.png)
And zip it all up to bulk fhir export format...
.png)
And here is what it looks like on disk if curious on the sizes.png)
Load
Upload the bulk fhir payload to the S3 bucket
.png)
Let the OMOP service do its thing...
.png)
Attestation
Although this is generally hand waving to validate the data, lets just see if after transformation if SCD concepts are present in the data..png)
Now lets see if anybody has Sickle Cell Diseases in the synthetic data..png)
FAQ
Did you use AI for any of this?
I used my computer.
Is the data accurate?
Its synthetic.
Will you get invited to any cocktail parties at the next OHDSI Symposium?
Probably not, this is an oversimplification of complicated observational dataset, but not meant to be offensive.
Any closing statements?
Just vibing this module, even with the 3 prompts, I gained even further appreciation for the complex challenges the OHDSI community solves with this observational data.
.jpg)
