
Package index
Importing
Survey data, i.e., data derived from questionnaires or systematic data collection, such as inspecting objects in nature, recording prices at shops are usually stored databases, and converted to complex files retaining at least coding, labelling metadata together with the data. This must be imported to R so that the appropriate harmonization tasks can be carried out with the appropriate R types.
-
read_surveys()read_survey() - Read survey file(s)
-
read_rds() - Read rds file
-
read_spss() - Read SPSS (`.sav`, `.zsav`, `.por`) files. Write `.sav` and `.zsav` files.
-
read_dta() - Read Stata DTA files (`.dta`) files
-
read_csv() - Read csv file
-
pull_survey() - Pull a survey from a survey list
Harmonizing concepts with metadata
After importing data with some descriptive metadata such as numerical coding and labelling, we need to create a map of the information that is in our R session to prepare a harmonization plan. We must find information related to sufficiently similar concepts that can be harmonized to be successfully joined into a single variable, and eventually a table of similar variables must be joined.
-
metadata_create()metadata_waves_create() - Create a metadata table from several surveys
-
metadata_survey_create() - Create a metadata table
-
retroharmonize - retroharmonize: Retrospective harmonization of survey data files
Codebooks
The new functions will follow the DDI and SDMX terminology. See vignette Harmonizing Concepts, Questions, and Variables
-
create_codebook()codebook_waves_create()codebook_surveys_create() - Create a survey codebook
Harmonize variable names
Before joining variables containing responses about the same concept, make sure that they have identical names in the re-processed surveys. See the vignette Working with a Crosswalk Table for examples and further clarification.
-
harmonize_var_names() - Harmonize the variable names of surveys
-
label_normalize()var_label_normalize()val_label_normalize() - Normalize value and variable labels
-
harmonize_survey_variables() - Read a survey from a CSV file
Harmonize numerical codes and labels
To merge variables from different surveys into a single variable, you must make sure that the numerical codes and labels, for example 0=‘no’ and 1=‘yes’ are processed identically. See the vignette Harmonize Value Labels for examples and further clarification.
-
collect_val_labels()collect_na_labels() - Collect labels from metadata file
-
harmonize_values() - Harmonize the values and labels of labelled vectors
-
harmonize_survey_values()harmonize_waves() - Harmonize values in surveys
-
merge_surveys()merge_waves() - Merge surveys
Harmonize missing and special cases
Some variable codes have a special meaning, such as a various labels of missing values which need to be converted differently to numeric, factor or character representation. See the vignette Harmonize Value Labels for examples and further clarification.
-
collect_val_labels()collect_na_labels() - Collect labels from metadata file
-
na_range_to_values() - Harmonize SPSS-style missing value ranges
-
harmonize_na_values() - Harmonize na_values in haven_labelled_spss
Crosswalk
Laying out the harmonization crosswalk scheme (unifying variable names, codes, labels.) See the vignette Working with a Crosswalk Table for examples and further clarification.
-
is.crosswalk_table()crosswalk_table_create() - Validate a crosswalk table
-
crosswalk_surveys()crosswalk() - Crosswalk and harmonize surveys
Subsetting
Remove variables that cannot be harmonized in your workflow either in memory (faster for smaller tasks) or sequentially from files. See the vignette Working with a Crosswalk Table for examples and further clarification.
-
subset_surveys()subset_waves()subset_save_surveys() - Subset and optionally harmonize surveys
-
document_survey_item() - Document survey item harmonization
-
document_surveys()document_waves() - Document survey lists
-
create_codebook()codebook_waves_create()codebook_surveys_create() - Create a survey codebook
Type conversion
Consistently treat labels, missing value ranges, missing value labels imported from SPSS, STATA or other source to use R language statistical functions, which mainly work with the base class of numeric or factor. For data visualization, the base class character may be preferred. See vignette The labelled_spss_survey class for further information.
-
survey()is.survey()summary(<survey>) - Create a survey data frame
-
is.survey_df()survey_df()print(<survey_df>) - Create a survey object
-
labelled_spss_survey()`[`(<retroharmonize_labelled_spss_survey>)print(<retroharmonize_labelled_spss_survey>)summary(<retroharmonize_labelled_spss_survey>)is.na(<retroharmonize_labelled_spss_survey>)levels(<retroharmonize_labelled_spss_survey>)`names<-`(<retroharmonize_labelled_spss_survey>)format(<retroharmonize_labelled_spss_survey>)is.labelled_spss_survey()median(<retroharmonize_labelled_spss_survey>)quantile(<retroharmonize_labelled_spss_survey>)weighted.mean(<retroharmonize_labelled_spss_survey>)mean(<retroharmonize_labelled_spss_survey>)sum(<retroharmonize_labelled_spss_survey>) - Labelled SPSS-style vectors with survey provenance
-
as_labelled_spss_survey() - Labelled to labelled_spss_survey
-
concatenate() - Concatenate haven_labelled_spss vectors
-
as_numeric()as_character()as_factor() - Coercion methods for labelled survey vectors