Skip to contents

There are finite values that are allowable in any one of the variables that CDC want in a submission. It is up to the user to implement the correct logic as defined in the data dictionaries and how-to-guides.

However if your logic is sound check_data_content will shoulder some of the heavy lifting and provide users with its best guess about each variable's formatting as to whether it is correct or not. This is a good way to quickly check your data before submitting it to the CDC. This is not a replacement for the CDC's EPHT Test Submission portal.

Usage

check_data_content(data, content_group_id)

Arguments

data

Pre-wrangled Dataframe.

content_group_id

Code that identifies the content found in EPHT documentation.

Value

list of exit statuses for each variable

Data

Users are expected to wrangle, aggregate and otherwise implement the logic expected by EPHT themselves. distiller will not handle that step, though it does provides some helpers: collapse_race(), collapse_ethnicity(), and make_months_worse().

In order for distiller to work properly there are some expectations about the data that must be met:

  • The data must be a dataframe or tibble

  • For all content_group_id The data must have the following columns: (in any order):

    • month: character - acceptable values: "01", "02", "03" ... "12"

    • agegroup: numeric - acceptable values: 1-19

    • county: character - string length of 5, unless unknown, then county = "U"

    • ethnicity: character - acceptable values: "H", "NH", "U"

    • race: character - acceptable values: "W", "B", "O", "U"

    • health_outcome_id: numeric - acceptable values: 1-5

    • sex: character - acceptable values: "M", "F", "U"

    • year: numeric - acceptable values: 2001-9999

    • monthly_count: numeric - acceptable values: >0 and no missing values

  • For content_group_id "CO-ED" and "CO-HOSP" the data must have the additional columns:

    • fire_count: numeric - acceptable values: >0 and no missing values

    • nonfire_count: numeric - acceptable values: >0 and no missing values

    • unknown_count: numeric - acceptable values: >0 and no missing values

Submission Check

If users set check_first = TRUE in make_xml_document() or runs check_submission() or any of the other check_* functions then the a suite of checks is run against the metadata, data structure and data content. Please note that users do not need to run the whole suite of checks, they can run each function piecemeal on their data as it is being prepared.

check_submission() is called which is a wrapper around the following functions:

Examples

data <-
  mtcars |>
  dplyr::rename(
    month = mpg,
    agegroup = cyl,
    county = disp,
    ethnicity = hp,
    health_outcome_id = drat,
    monthly_count = wt,
    race = qsec,
    sex = vs,
    year = am
  ) |>
  dplyr::select(-c(gear, carb))

check_data_content(data, "AS-HOSP")
#> $check_month_var
#> $check_month_var$code
#> [1] 1
#> 
#> $check_month_var$message
#> Danger: month does not have allowable value/s
#> Troublemakers: allowed_values
#> 
#> 
#> $check_agegroup_var
#> $check_agegroup_var$code
#> [1] 0
#> 
#> $check_agegroup_var$message
#> Success: agegroup
#> 
#> 
#> $check_county_var
#> $check_county_var$code
#> [1] 1
#> 
#> $check_county_var$message
#> Danger: county does not have allowable value/s
#> Troublemakers: length
#> 
#> 
#> $check_ethnicity_var
#> $check_ethnicity_var$code
#> [1] 1
#> 
#> $check_ethnicity_var$message
#> Danger: ethnicity does not have allowable value/s
#> Troublemakers: allowed_values
#> 
#> 
#> $check_health_outcome_id_var
#> $check_health_outcome_id_var$code
#> [1] 1
#> 
#> $check_health_outcome_id_var$message
#> Danger: health_outcome_id does not have allowable value/s
#> Troublemakers: allowed_values
#> 
#> 
#> $check_sex_var
#> $check_sex_var$code
#> [1] 1
#> 
#> $check_sex_var$message
#> Danger: sex does not have allowable value/s
#> Troublemakers: allowed_values
#> 
#> 
#> $check_year_var
#> $check_year_var$code
#> [1] 1
#> 
#> $check_year_var$message
#> Danger: year does not have allowable value/s
#> Troublemakers: allowed_values
#> 
#> 
#> $check_race_var
#> $check_race_var$code
#> [1] 1
#> 
#> $check_race_var$message
#> Danger: race does not have allowable value/s
#> Troublemakers: allowed_values
#> 
#> 
#> $check_monthly_count_var
#> $check_monthly_count_var$code
#> [1] 0
#> 
#> $check_monthly_count_var$message
#> Success: monthly_count
#> 
#>