Package {PhenotypeR}


Type: Package
Title: Assess Study Cohorts Using a Common Data Model
Version: 0.5.0
Description: Phenotype study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model. Diagnostics are run at the database, code list, cohort, and population level to assess whether study cohorts are ready for research.
License: Apache License (≥ 2)
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Suggests: CDMConnector (≥ 1.6.1), duckdb, DBI, gt, omock, testthat (≥ 3.0.0), knitr, glue, RPostgres, ggplot2, stringr, shiny (≥ 1.11.1), DiagrammeR, DiagrammeRsvg, reactable, rsvg, sortable, shinycssloaders, here, DT, bslib, shinyWidgets, plotly, tidyr, scales, usethis, rmarkdown, CohortSurvival (≥ 1.1.0), ellmer, htmltools, visOmopResults (≥ 1.4.2), rsconnect, cpp11, progress, qs2, lubridate, systemfonts, officer, fs, OmopConstructor, tools, jsonlite, jsonvalidate, shinyjs
Config/testthat/edition: 3
RoxygenNote: 7.3.3
Imports: cli, clock, CodelistGenerator (≥ 4.0.2), CohortCharacteristics (≥ 1.1.3), CohortConstructor (≥ 0.6.2), dplyr, DrugUtilisation (≥ 1.1.0), IncidencePrevalence (≥ 1.2.0), MeasurementDiagnostics (≥ 0.3.0), omopgenerics (≥ 1.2.0), OmopSketch (≥ 1.0.1), PatientProfiles (≥ 1.4.5), purrr, readr, rlang, vctrs
URL: https://ohdsi.github.io/PhenotypeR/
BugReports: https://github.com/OHDSI/PhenotypeR/issues
VignetteBuilder: knitr
Config/testthat/parallel: true
NeedsCompilation: no
Packaged: 2026-05-26 20:58:26 UTC; orms0426
Author: Edward Burn ORCID iD [aut, cre], Martí Català ORCID iD [aut], Xihang Chen ORCID iD [aut], Marta Alcalde-Herraiz ORCID iD [aut], Nuria Mercade-Besora ORCID iD [aut], Albert Prats-Uribe ORCID iD [aut]
Maintainer: Edward Burn <edward.burn@ndorms.ox.ac.uk>
Repository: CRAN
Date/Publication: 2026-05-27 06:30:02 UTC

PhenotypeR: Assess Study Cohorts Using a Common Data Model

Description

logo

Phenotype study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model. Diagnostics are run at the database, code list, cohort, and population level to assess whether study cohorts are ready for research.

Author(s)

Maintainer: Edward Burn edward.burn@ndorms.ox.ac.uk (ORCID)

Authors:

See Also

Useful links:


Adds the cohort_codelist attribute to a cohort

Description

addCodelistAttribute() allows the users to add a codelist to a cohort in OMOP CDM.

This is particularly important for the use of codelistDiagnostics(), as the underlying assumption is that the cohort that is fed into codelistDiagnostics() has a cohort_codelist attribute attached to it.

Usage

addCodelistAttribute(cohort, codelist, cohortName = names(codelist))

Arguments

cohort

Cohort table in a cdm reference

codelist

Named list of concepts

cohortName

For each element of the codelist, the name of the cohort in cohort to which the codelist refers

Value

A cohort

Examples


library(omock)
library(CohortConstructor)
library(PhenotypeR)

cdm <- mockCdmFromDataset(source = "duckdb")
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")

cohort <- addCodelistAttribute(cohort = cdm$warfarin,
               codelist = list("warfarin" = c(1310149L,  40163554L)))
attr(cohort, "cohort_codelist")

CDMConnector::cdmDisconnect(cdm)


Clinical description specification

Description

Clinical description specification

Usage

clinicalDescriptionSpecification(path = NULL)

Arguments

path

If NULL, specification will be returned as an R object. If a path to a directory is provided the specification will be exported.

Value

JSON specification for clinical descriptions


Run codelist-level diagnostics

Description

codelistDiagnostics() runs phenotypeR diagnostics on the cohort_codelist attribute on the cohort. Thus codelist attribute of the cohort must be populated. If it is missing then it could be populated using addCodelistAttribute() function.

Furthermore codelistDiagnostics() requires achilles tables to be present in the cdm so that concept counts could be derived.

Usage

codelistDiagnostics(
  cohort,
  cohortId = NULL,
  achillesCodeUse = FALSE,
  orphanCodeUse = TRUE,
  cohortCodeUse = TRUE,
  drugDiagnostics = FALSE,
  drugDiagnosticsSample = 20000,
  measurementDiagnostics = FALSE,
  measurementDiagnosticsSample = 20000
)

Arguments

cohort

A cohort table in a cdm reference. The cohort_codelist attribute must be populated. The cdm reference must contain achilles tables as these will be used for deriving concept counts.

cohortId

Specific cohort definition ID for which to run codelist diagnostics.

achillesCodeUse

Whether to run CodelistGenerator::summariseAchillesCodeUse() (TRUE) or not (FALSE).

orphanCodeUse

Whether to run CodelistGenerator::summariseOrphanCodeUse() (TRUE) or not (FALSE).

cohortCodeUse

Whether to run CodelistGenerator::summariseCohortCodeUse() (TRUE) or not (FALSE).

drugDiagnostics

Whether to run drug diagnostics (TRUE) or not (FALSE). Note that, if set to TRUE, the diagnostics will only run if the cohort code list contains drug codes.

drugDiagnosticsSample

The number of people to take a random sample for drug diagnostics. If drugDiagnosticsSample = NULL, no sampling will be performed. If drugDiagnosticsSample = 0 drug diagnostics will not be run.

measurementDiagnostics

Whether to run measurement diagnostics (TRUE) or not (FALSE). Note that, if set to TRUE, the diagnostics will only run if the cohort code list contains measurement codes.

measurementDiagnosticsSample

The number of people to take a random sample for measurement diagnostics. If measurementDiagnosticsSample = NULL, no sampling will be performed. If measurementDiagnosticsSample = 0 measurement diagnostics will not be run.

Value

A summarised result

Examples


library(omock)
library(CohortConstructor)
library(PhenotypeR)

cdm <- mockCdmFromDataset(source = "duckdb")
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")
result <- codelistDiagnostics(cdm$warfarin)

CDMConnector::cdmDisconnect(cdm = cdm)


Run cohort-level diagnostics

Description

Runs phenotypeR diagnostics on the cohort. The diganostics include:

Usage

cohortDiagnostics(
  cohort,
  cohortId = NULL,
  cohortCount = TRUE,
  cohortCharacteristics = TRUE,
  largeScaleCharacteristics = TRUE,
  compareCohorts = FALSE,
  cohortSurvival = FALSE,
  cohortSample = 20000,
  matchedSample = 1000
)

Arguments

cohort

Cohort table in a cdm reference

cohortId

Specific cohort definition ID for which to run cohort diagnostics.

cohortCount

Whether to run CohortCharacteristics::summariseCohortCount() and CohortCharacteristics::summariseCohortAttrition() (TRUE) or not (FALSE).

cohortCharacteristics

Whether to run CohortCharacteristics::summariseCharacteristics() and summarise age density (TRUE) or not (FALSE).

largeScaleCharacteristics

Whether to run CohortCharacteristics::summariseLargeScaleCharacteristics() (TRUE) or not (FALSE).

compareCohorts

Whether to run CohortCharacteristics::summariseCohortOverlap() and CohortCharacteristics::summariseCohortTiming() (TRUE) or not (FALSE). Notice that, if set to TRUE, the diagnostics will only be run when there are more than one cohort.

cohortSurvival

Whether to run CohortSurvival::estimateSingleEventSurvival() (TRUE) or not (FALSE).

cohortSample

The number of people to take a random sample for cohortDiagnostics. If cohortSample = NULL, no sampling will be performed.

matchedSample

The number of people to take a random sample for matching. If matchedSample = NULL, no sampling will be performed. If matchedSample = 0, no matched cohorts will be created.

Value

A summarised result

Examples



library(omock)
library(CohortConstructor)
library(PhenotypeR)
library(omock)
library(CDMConnector)

cdm <- mockCdmFromDataset(source = "duckdb")
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")

result <- cohortDiagnostics(cdm$warfarin)

cdmDisconnect(cdm)


Helper for consistent documentation of cohort.

Description

Helper for consistent documentation of cohort.

Arguments

cohort

Cohort table in a cdm reference


Helper for consistent documentation of cohortSample.

Description

Helper for consistent documentation of cohortSample.

Arguments

cohortSample

The number of people to take a random sample for cohortDiagnostics. If cohortSample = NULL, no sampling will be performed.


Data source description specification

Description

Data source description specification

Usage

dataSourceDescriptionSpecification(path = NULL)

Arguments

path

If NULL, specification will be returned as an R object. If a path to a directory is provided the specification will be exported.

Value

JSON specification for data source descriptions


Database diagnostics

Description

PhenotypeR diagnostics on the cdm object.

Diagnostics include:

Usage

databaseDiagnostics(
  cohort,
  cohortId = NULL,
  snapshot = TRUE,
  personTableSummary = TRUE,
  observationPeriodsSummary = TRUE,
  clinicalRecordsSummary = FALSE
)

Arguments

cohort

Cohort table in a cdm reference

cohortId

Specific cohort definition ID for which to run database diagnostics. This will only affect the clinical tables summary results.

snapshot

Whether to run OmopSketch::summariseOmopSnapshot() (TRUE) or not (FALSE).

personTableSummary

Whether to run OmopSketch::summarisePerson() (TRUE) or not (FALSE).

observationPeriodsSummary

Whether to run OmopSketch::summariseObservationPeriod() (TRUE) or not (FALSE).

clinicalRecordsSummary

Whether to run OmopSketch::summariseClinicalRecords() on those clinical tables where the codes associated with your cohort are found (TRUE) or not (FALSE).

Value

A summarised result

Examples


library(omock)
library(PhenotypeR)
library(CohortConstructor)
library(CDMConnector)

cdm <- mockCdmFromDataset(source = "duckdb")

cdm$new_cohort <- conceptCohort(cdm,
                                conceptSet = list("codes" = c(40213201L, 4336464L)),
                                name = "new_cohort")

 result <- databaseDiagnostics(cohort = cdm$new_cohort)

 cdmDisconnect(cdm = cdm)


Helper for consistent documentation of directory.

Description

Helper for consistent documentation of directory.

Arguments

directory

Directory where to save report


Download a Clinical Description Template

Description

Download a Clinical Description Template

Usage

downloadClinicalDescriptionTemplate(
  directory,
  name = "clinical_description_template"
)

Arguments

directory

Directory where to download the clinical description.

name

Name of the Word file.Note that the file must match the cohort names used in PhenotypeR Diagnostics if you want to integrate the clinical description into the PhenotypeR Shiny app.

Value

A Word document with the template of the clinical description.

Examples


library(PhenotypeR)
library(here)

downloadClinicalDescriptionTemplate(directory = here(),
                                    name = "metformin")




Download a Clinical Description Template

Description

Download a Clinical Description Template

Usage

downloadDatabaseDescriptionTemplate(
  directory,
  name = "database_description_template"
)

Arguments

directory

Directory where to download the database description template.

name

Name of the Word file.Note that the file must match the database names used in PhenotypeR Diagnostics if you want to integrate the database description into the PhenotypeR Shiny app.

Value

A Word document with the template of the clinical description.

Examples


library(PhenotypeR)

downloadDatabaseDescriptionTemplate(directory = tempdir(),
                                    name = "GiBleed")



Draft clinical descriptions using an LLM

Description

Draft clinical descriptions using an LLM

Usage

draftClinicalDescription(chat, name, outputDir)

Arguments

chat

An ellmer chat

name

Clinical event of interest

outputDir

Folder to save clinical descriptions.

Value

Creates a draft clinical description for each event of interest.


Helper for consistent documentation of drugDiagnosticsSample.

Description

Helper for consistent documentation of drugDiagnosticsSample.

Arguments

drugDiagnosticsSample

The number of people to take a random sample for drug diagnostics. If drugDiagnosticsSample = NULL, no sampling will be performed. If drugDiagnosticsSample = 0 drug diagnostics will not be run.


Helper for consistent documentation of expectations.

Description

Helper for consistent documentation of expectations.

Arguments

expectations

Data frame or tibble with cohort expectations. It must contain the following columns: cohort_name, estimate, value, and source.


Get cohort expectations using an LLM

Description

Get cohort expectations using an LLM

Usage

getCohortExpectations(chat, phenotypes, outputDir)

Arguments

chat

An ellmer chat

phenotypes

Either a vector of phenotype names or results from PhenotypeR.

outputDir

Folder to save expectations.

Value

A tibble with expectations about the cohort.


Import clinical descriptions

Description

Import clinical descriptions

Usage

importClinicalDescription(path)

Arguments

path

Either a directory containing clinical descriptions or a path to a specific clinical description

Value

A list of clinical descriptions


Import database descriptions

Description

Import database descriptions

Usage

importDatabaseDescription(path)

Arguments

path

Either a directory containing database descriptions or a path to a specific database description

Value

A list of database descriptions


Helper for consistent documentation of matched.

Description

Helper for consistent documentation of matched.

Arguments

matchedSample

The number of people to take a random sample for matching. If matchedSample = NULL, no sampling will be performed. If matchedSample = 0, no matched cohorts will be created.


Helper for consistent documentation of measurementDiagnosticsSample.

Description

Helper for consistent documentation of measurementDiagnosticsSample.

Arguments

measurementDiagnosticsSample

The number of people to take a random sample for measurement diagnostics. If measurementDiagnosticsSample = NULL, no sampling will be performed. If measurementDiagnosticsSample = 0 measurement diagnostics will not be run.


Phenotype a cohort

Description

This comprises all the diagnostics that are being offered in this package, this includes:

Usage

phenotypeDiagnostics(
  cohort,
  databaseDiagnostics = list(),
  codelistDiagnostics = list(),
  cohortDiagnostics = list(),
  populationDiagnostics = list(),
  stagingDirectory = NULL
)

Arguments

cohort

Cohort table in a cdm reference

databaseDiagnostics

A list of arguments that uses databaseDiagnostics. If the list is empty, the default values will be used. Example: In the following example, all diagnostics will be run except person table summary from databaseDiagnostics: *databaseDiagnostics = list( "personTableSummary" = FALSE )

codelistDiagnostics

A list of arguments that uses codelistDiagnostics. If the list is empty, the default values will be used. Example: In the below example, all diagnostics will be run, and a subsample of 1,000 participants will be used to run measurement diagnostics and another independent subsample of 500 participants will be used to run drug diagnostics: *codelistDiagnostics = list( "measurementDiagnosticsSample" = 1000, "drugDiagnosticsSample" = 500 )

cohortDiagnostics

A list of arguments that uses cohortDiagnostics. If the list is empty, the default values will be used. Example: *cohortDiagnostics = list( "cohortSurvival" = TRUE )

populationDiagnostics

A list of arguments that uses populationDiagnostics. If the list is empty, the default values will be used. Example: In the below example, all diagnostics will be run and a subsample of 100,000 participants will be used to run populationDiagnostics. *populationDiagnostics = list( "populationSample" = 100000 )

stagingDirectory

Path to folder to save incremental results and log file

Value

A summarised result

Examples


library(omock)
library(CohortConstructor)
library(PhenotypeR)

cdm <- mockCdmFromDataset(source = "duckdb")
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")
result <- phenotypeDiagnostics(cdm$warfarin)



Population-level diagnostics

Description

PhenotypeR diagnostics on the cohort of input with relation to a denomination population. Diagnostics include:

Usage

populationDiagnostics(
  cohort,
  cohortId = NULL,
  incidence = TRUE,
  periodPrevalence = TRUE,
  populationSample = 1e+05,
  populationDateRange = as.Date(c(NA, NA))
)

Arguments

cohort

Cohort table in a cdm reference

cohortId

Specific cohort definition ID for which to run population diagnostics.

incidence

Whether to run IncidencePrevalence::estimateIncidence() (TRUE) or not (FALSE).

periodPrevalence

Whether to run IncidencePrevalence::estimatePeriodPrevalence() (TRUE) or not (FALSE).

populationSample

Number of people from the cdm to sample. If NULL no sampling will be performed. Sample will be within populationDateRange if specified.

populationDateRange

Two dates. The first indicating the earliest cohort start date and the second indicating the latest possible cohort end date. If NULL or the first date is set as missing, the earliest observation_start_date in the observation_period table will be used for the former. If NULL or the second date is set as missing, the latest observation_end_date in the observation_period table will be used for the latter.

Value

A summarised result

Examples


library(omock)
library(CohortConstructor)
library(PhenotypeR)
library(CDMConnector)

cdm <- mockCdmFromDataset(source = "duckdb")
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")

result <- cdm$warfarin |>
  populationDiagnostics(populationSample = 100000)

cdmDisconnect(cdm = cdm)


Helper for consistent documentation of populationSample.

Description

Helper for consistent documentation of populationSample.

Arguments

populationSample

Number of people from the cdm to sample. If NULL no sampling will be performed. Sample will be within populationDateRange if specified.

populationDateRange

Two dates. The first indicating the earliest cohort start date and the second indicating the latest possible cohort end date. If NULL or the first date is set as missing, the earliest observation_start_date in the observation_period table will be used for the former. If NULL or the second date is set as missing, the latest observation_end_date in the observation_period table will be used for the latter.


Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

CodelistGenerator

summariseAchillesCodeUse, summariseCodeUse, summariseCohortCodeUse, summariseOrphanCodes

omopgenerics

bind, exportSummarisedResult, importSummarisedResult, settings, suppress


Helper for consistent documentation of result.

Description

Helper for consistent documentation of result.

Arguments

result

A summarised result


Shiny app to create clinical descriptions for contextualising diagnostic results

Description

Shiny app to create clinical descriptions for contextualising diagnostic results

Usage

shinyClinicalDescriptions(directory, open = rlang::is_interactive())

Arguments

directory

Directory where to save shiny app

open

If TRUE, the shiny app will be launched in a new session. If FALSE, the shiny app will be created but not launched.

Value

Shiny app

Examples


shinyClinicalDescriptions(tempdir())


Shiny app to create data source descriptions for contextualising diagnostic results

Description

Shiny app to create data source descriptions for contextualising diagnostic results

Usage

shinyDataSourceDescriptions(directory, open = rlang::is_interactive())

Arguments

directory

Directory where to save shiny app

open

If TRUE, the shiny app will be launched in a new session. If FALSE, the shiny app will be created but not launched.

Value

Shiny app

Examples


shinyDataSourceDescriptions(tempdir())


Create a shiny app summarising your phenotyping results

Description

A shiny app that is designed for any diagnostics results from phenotypeR, this includes:

Usage

shinyDiagnostics(
  result,
  directory,
  minCellCount = 5,
  open = rlang::is_interactive(),
  expectationsDir = NULL,
  clinicalDescriptionsDir = NULL,
  databaseDescriptionsDir = NULL,
  removeEmptyTabs = TRUE
)

Arguments

result

A summarised result

directory

Directory where to save report

minCellCount

Minimum cell count for suppression when exporting results.

open

If TRUE, the shiny app will be launched in a new session. If FALSE, the shiny app will be created but not launched.

expectationsDir

Directory where to find the expectations CSV.

clinicalDescriptionsDir

Directory where to find the clinical descriptions word documents.

databaseDescriptionsDir

Directory where to find the database descriptions word documents.

removeEmptyTabs

Whether to remove tabs of those diagnostics that have not been performed or that were insufficient counts to produce a result (TRUE) or not (FALSE)

Value

A shiny app

Examples


library(omock)
library(CohortConstructor)
library(PhenotypeR)

cdm <- mockCdmFromDataset(source = "duckdb")
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")

result <- phenotypeDiagnostics(cdm$warfarin,
                               populationDiagnostics = list("populationSample" = 100000))

shinyDiagnostics(result,
                tempdir())

CDMConnector::cdmDisconnect(cdm = cdm)


Helper for consistent documentation of survival.

Description

Helper for consistent documentation of survival.

Arguments

survival

TRUE or FALSE. Whether to conduct survival analysis (TRUE) or not (FALSE).


Create a table summarising cohort expectations

Description

Create a table summarising cohort expectations

Usage

tableCohortExpectations(expectations, type = "reactable")

Arguments

expectations

Data frame or tibble with cohort expectations. It must contain the following columns: cohort_name, estimate, value, and source.

type

Table type to view results. See visOmopResults::tableType() for supported tables.

Value

Summary of cohort expectations