Skip to contents

Importing

Survey data, i.e., data derived from questionnaires or systematic data collection, such as inspecting objects in nature, recording prices at shops are usually stored databases, and converted to complex files retaining at least coding, labelling metadata together with the data. This must be imported to R so that the appropriate harmonization tasks can be carried out with the appropriate R types.

read_surveys() read_survey()
Read survey file(s)
read_rds()
Read rds file
read_spss()
Read SPSS (`.sav`, `.zsav`, `.por`) files. Write `.sav` and `.zsav` files.
read_dta()
Read Stata DTA files (`.dta`) files
read_csv()
Read csv file
pull_survey()
Pull a survey from a survey list

Harmonizing concepts with metadata

After importing data with some descriptive metadata such as numerical coding and labelling, we need to create a map of the information that is in our R session to prepare a harmonization plan. We must find information related to sufficiently similar concepts that can be harmonized to be successfully joined into a single variable, and eventually a table of similar variables must be joined.

metadata_create() metadata_waves_create()
Create a metadata table from several surveys
metadata_survey_create()
Create a metadata table
retroharmonize
retroharmonize: Retrospective harmonization of survey data files

Codebooks

The new functions will follow the DDI and SDMX terminology. See vignette Harmonizing Concepts, Questions, and Variables

codebook_create()
Create a codebook

Harmonize variable names

Before joining variables containing responses about the same concept, make sure that they have identical names in the re-processed surveys. See the vignette Working with a Crosswalk Table for examples and further clarification.

harmonize_var_names()
Harmonize the variable names of surveys
label_normalize() var_label_normalize() val_label_normalize()
Normalize value and variable labels
harmonize_survey_variables()
Harmonize survey variables

Codelists

The new functions will follow the DDI and SDMX terminology. The old terminology uses variable harmonization (below). See vignette Value Labels and Codelists

codelist_create()
Create a codelist

Harmonize numerical codes and labels

To merge variables from different surveys into a single variable, you must make sure that the numerical codes and labels, for example 0=‘no’ and 1=‘yes’ are processed identically. See the vignette Harmonize Value Labels for examples and further clarification.

collect_val_labels() collect_na_labels()
Collect labels from metadata file
harmonize_values()
Harmonize the values and labels of labelled vectors
harmonize_survey_values() harmonize_waves()
Harmonize values in surveys
merge_surveys() merge_waves()
Merge surveys

Harmonize missing and special cases

Some variable codes have a special meaning, such as a various labels of missing values which need to be converted differently to numeric, factor or character representation. See the vignette Harmonize Value Labels for examples and further clarification.

collect_val_labels() collect_na_labels()
Collect labels from metadata file
na_range_to_values() is.na_range_to_values()
Harmonize user-defined missing value ranges
harmonize_na_values()
Harmonize na_values in haven_labelled_spss

Crosswalk

Laying out the harmonization crosswalk scheme (unifying variable names, codes, labels.) See the vignette Working with a Crosswalk Table for examples and further clarification.

Subsetting

Remove variables that cannot be harmonized in your workflow either in memory (faster for smaller tasks) or sequentially from files. See the vignette Working with a Crosswalk Table for examples and further clarification.

Documentation functions

These functionality requires a thorough review.

document_survey_item()
Document survey item harmonization
document_surveys() document_waves()
Document survey lists
create_codebook() codebook_waves_create() codebook_surveys_create()
Create a codebook

Type conversion

Consistently treat labels, missing value ranges, missing value labels imported from SPSS, STATA or other source to use R language statistical functions, which mainly work with the base class of numeric or factor. For data visualization, the base class character may be preferred. See vignette The labelled_spss_survey class for further information.

survey() is.survey() summary(<survey>)
Create a survey data frame
labelled_spss_survey() as_character() is.labelled_spss_survey() as_numeric()
Labelled vectors for multiple SPSS surveys
as_labelled_spss_survey()
Labelled to labelled_spss_survey
concatenate()
Concatenate haven_labelled_spss vectors
as_factor()
Convert labelled_spss_survey vector To Factor