The omock package provides functionality to quickly create a cdm reference containing synthetic data based on population settings specified by the user.
First, let’s load packages required for this vignette.
Now, in three lines of code, we can create a cdm reference with a person and observation period table for 1000 people.
cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
mockPerson(nPerson = 1000) |>
mockObservationPeriod()
cdm
#>
#> ── # OMOP CDM reference (local) of synthetic cdm ───────────────────────────────
#> • omop tables: observation_period, person
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
cdm$person |> glimpse()
#> Rows: 1,000
#> Columns: 18
#> $ person_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
#> $ gender_concept_id <int> 8532, 8507, 8507, 8507, 8507, 8507, 8532, …
#> $ year_of_birth <int> 1952, 1999, 1992, 1966, 1954, 1980, 1997, …
#> $ month_of_birth <int> 3, 5, 10, 5, 1, 11, 6, 1, 7, 2, 6, 11, 5, …
#> $ day_of_birth <int> 13, 14, 5, 26, 31, 27, 21, 28, 16, 10, 28,…
#> $ race_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ birth_datetime <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ location_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ provider_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ care_site_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ person_source_value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
cdm$observation_period |> glimpse()
#> Rows: 1,000
#> Columns: 5
#> $ person_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ observation_period_start_date <date> 1995-11-15, 2019-12-20, 2009-02-18, 201…
#> $ observation_period_end_date <date> 2016-05-12, 2019-12-30, 2011-09-12, 201…
#> $ observation_period_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ period_type_concept_id <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…We can add further requirements around the population we create. For example we can require that they were born between 1960 and 1980 like so.
cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
mockPerson(
nPerson = 1000,
birthRange = as.Date(c("1960-01-01", "1980-12-31"))
) |>
mockObservationPeriod()cdm$person |>
collect() |>
ggplot() +
geom_histogram(aes(as.integer(year_of_birth)),
binwidth = 1, colour = "grey"
) +
theme_minimal() +
xlab("Year of birth")