There are finite values that are allowable in any one of the variables that CDC want in a submission. It is up to the user to implement the correct logic as defined in the data dictionaries and how-to-guides.
However if your logic is sound check_data_content
will shoulder some
of the heavy lifting and provide users with its best guess about each
variable's formatting as to whether it is correct or not. This is
a good way to quickly check your data before submitting it to the CDC.
This is not a replacement for the CDC's EPHT Test Submission portal.
Data
Users are expected to wrangle, aggregate and otherwise implement the logic
expected by EPHT themselves. distiller
will not handle that step, though it
does provides some helpers: collapse_race()
, collapse_ethnicity()
, and
make_months_worse()
.
In order for distiller
to work properly there are some expectations about
the data that must be met:
The data must be a dataframe or tibble
For all
content_group_id
The data must have the following columns: (in any order):month: character - acceptable values: "01", "02", "03" ... "12"
agegroup: numeric - acceptable values: 1-19
county: character - string length of 5, unless unknown, then county = "U"
ethnicity: character - acceptable values: "H", "NH", "U"
race: character - acceptable values: "W", "B", "O", "U"
health_outcome_id: numeric - acceptable values: 1-5
sex: character - acceptable values: "M", "F", "U"
year: numeric - acceptable values: 2001-9999
monthly_count: numeric - acceptable values: >0 and no missing values
For
content_group_id
"CO-ED" and "CO-HOSP" the data must have the additional columns:fire_count: numeric - acceptable values: >0 and no missing values
nonfire_count: numeric - acceptable values: >0 and no missing values
unknown_count: numeric - acceptable values: >0 and no missing values
Submission Check
If users set check_first
= TRUE
in make_xml_document()
or runs
check_submission()
or any of the other check_* functions
then the a suite
of checks is run against the metadata, data structure and data content.
Please note that users do not need to run the whole suite of checks, they can
run each function piecemeal on their data as it is being prepared.
check_submission()
is called which is a wrapper around
the following functions:
See also
Other checks:
check_content_group_id()
,
check_data()
,
check_jurisdiction_code()
,
check_mcn()
,
check_state_fips_code()
,
check_submission()
,
check_submitter_email()
,
check_submitter_name()
,
check_submitter_title()
Examples
data <-
mtcars |>
dplyr::rename(
month = mpg,
agegroup = cyl,
county = disp,
ethnicity = hp,
health_outcome_id = drat,
monthly_count = wt,
race = qsec,
sex = vs,
year = am
) |>
dplyr::select(-c(gear, carb))
check_data_content(data, "AS-HOSP")
#> $check_month_var
#> $check_month_var$code
#> [1] 1
#>
#> $check_month_var$message
#> Danger: month does not have allowable value/s
#> Troublemakers: allowed_values
#>
#>
#> $check_agegroup_var
#> $check_agegroup_var$code
#> [1] 0
#>
#> $check_agegroup_var$message
#> Success: agegroup
#>
#>
#> $check_county_var
#> $check_county_var$code
#> [1] 1
#>
#> $check_county_var$message
#> Danger: county does not have allowable value/s
#> Troublemakers: length
#>
#>
#> $check_ethnicity_var
#> $check_ethnicity_var$code
#> [1] 1
#>
#> $check_ethnicity_var$message
#> Danger: ethnicity does not have allowable value/s
#> Troublemakers: allowed_values
#>
#>
#> $check_health_outcome_id_var
#> $check_health_outcome_id_var$code
#> [1] 1
#>
#> $check_health_outcome_id_var$message
#> Danger: health_outcome_id does not have allowable value/s
#> Troublemakers: allowed_values
#>
#>
#> $check_sex_var
#> $check_sex_var$code
#> [1] 1
#>
#> $check_sex_var$message
#> Danger: sex does not have allowable value/s
#> Troublemakers: allowed_values
#>
#>
#> $check_year_var
#> $check_year_var$code
#> [1] 1
#>
#> $check_year_var$message
#> Danger: year does not have allowable value/s
#> Troublemakers: allowed_values
#>
#>
#> $check_race_var
#> $check_race_var$code
#> [1] 1
#>
#> $check_race_var$message
#> Danger: race does not have allowable value/s
#> Troublemakers: allowed_values
#>
#>
#> $check_monthly_count_var
#> $check_monthly_count_var$code
#> [1] 0
#>
#> $check_monthly_count_var$message
#> Success: monthly_count
#>
#>