Survey data, i.e., data derived from questionnaires or systematic data collection, such as inspecting objects in nature, recording prices at shops are usually stored databases, and converted to complex files retaining at least coding, labelling metadata together with the data. This must be imported to R so that the appropriate harmonization tasks can be carried out with the appropriate R types.
- Read rds file
- Read SPSS (`.sav`, `.zsav`, `.por`) files. Write `.sav` and `.zsav` files.
- Read Stata DTA files (`.dta`) files
- Read csv file
- Pull a survey from a survey list
After importing data with some descriptive metadata such as numerical coding and labelling, we need to create a map of the information that is in our R session to prepare a harmonization plan. We must find information related to sufficiently similar concepts that can be harmonized to be successfully joined into a single variable, and eventually a table of similar variables must be joined.
- Create a metadata table
- retroharmonize: Retrospective harmonization of survey data files
The new functions will follow the DDI and SDMX terminology. See vignette Harmonizing Concepts, Questions, and Variables
- Create a codebook
Before joining variables containing responses about the same concept, make sure that they have identical names in the re-processed surveys. See the vignette Working with a Crosswalk Table for examples and further clarification.
- Harmonize the variable names of surveys
- Harmonize survey variables
The new functions will follow the DDI and SDMX terminology. The old terminology uses variable harmonization (below). See vignette Value Labels and Codelists
- Create a codelist
To merge variables from different surveys into a single variable, you must make sure that the numerical codes and labels, for example 0=‘no’ and 1=‘yes’ are processed identically. See the vignette Harmonize Value Labels for examples and further clarification.
- Harmonize the values and labels of labelled vectors
Some variable codes have a special meaning, such as a various labels of missing values which need to be converted differently to numeric, factor or character representation. See the vignette Harmonize Value Labels for examples and further clarification.
- Harmonize na_values in haven_labelled_spss
Laying out the harmonization crosswalk scheme (unifying variable names, codes, labels.) See the vignette Working with a Crosswalk Table for examples and further clarification.
Remove variables that cannot be harmonized in your workflow either in memory (faster for smaller tasks) or sequentially from files. See the vignette Working with a Crosswalk Table for examples and further clarification.
- Document survey item harmonization
Consistently treat labels, missing value ranges, missing value labels imported from SPSS, STATA or other source to use R language statistical functions, which mainly work with the base class of numeric or factor. For data visualization, the base class character may be preferred. See vignette The labelled_spss_survey class for further information.
- Labelled vectors for multiple SPSS surveys
- Labelled to labelled_spss_survey
- Concatenate haven_labelled_spss vectors
- Convert labelled_spss_survey vector To Factor