Secondary use of existing (archival) data studies includes all of the following:
- Data that are collected for non-research purposes (i.e. student records) or collected for a research study other than the proposed study (i.e. another study’s data set)
- The proposed study plans to use the existing data as opposed to gathering new data (or possibly in conjunction with newly gathered data)
- Data contains information that can be linked to individuals (though not necessarily to the individual’s identity)
- Data are the primary source (versus a secondary source where the data was analyzed for another publication)
- While data usually exists prior to the protocol’s approval, there are instances in which the data can continue to accumulate; however, the researcher cannot be engaged in gathering the data. For example, students can continue to add content to their student records that can be accessed by the researcher (with appropriate permission) but the researcher is not engaged in contacting the students directly for this information.
In order for the Board to assess the risks to the participants through the use of existing data sources and make recommendations for ethical use of the data, they will need to know the following:
- How did you obtain access to the data? The Board will need to know if the data are publicly available or if there are restrictions for accessing the data. If the second is true, the Board will need to know how you obtained permission to access the data.
- What do the data consist of? The Board will need to know if you are using data sets, video tapes, audio tapes, journal entries, transcripts, etc. If you are using data sets, they will need to know what data fields you will use.
- How many records will you access? Will the data be combined with other data sources? How easy is it to deduce the identities of the participants? The Board needs to understand the complete picture of the data and the potential to deduce identity which could compromise confidentiality.
- Can the participants be linked to their data? The Board will need to know in what form you will receive the data. Can the data be de-identified? Are the data linked and stripped of identifiers? Who prepared the data for you? Will you merge multiple data sets?
For suggestions on how to create a "Secondary Data" iProtocol, see Creating and Submitting a New iProtocol.
Exempt studies are not under the same obligation to obtain consent from participants (though the Board often asks researchers to provide information about the study to participants using a Study Notification). The federal regulations allow the Board to exempt research involving the secondary use of existing data if either of the following are true:
- the identifiable private information is publicly available
- the information is recorded by the investigator in such a manner that the identity of the participant cannot be readily ascertained directly or through identifiers linked to the subject, the investigator does not contact the subjects, and the investigator will not re-identify the subjects.
In addition, data sets that specifically targets prisoners cannot be exempted. If a prisoner is incidentally included in a data set, the data set can be exempted.
The Board evaluates the existing data source (i.e. public or private) and if the data can identify the participants to determine exemption. If the protocol qualifies for exemption, the Board does not require researchers to obtain consent from participants. If the protocol does not qualify for exemption, the board may consider waiving consent or they may require that researcher obtain consent from participants.
Data collected by various government agencies and academic institutions make their data available to the public for research purposes. Any data set that is made available to the public and does not require special permission to access the data is considered a publicly available data set. Generally speaking, publicly available data sets don’t meet the federal definition of human subject research.
The data sets listed below do not require IRB review except in the case where the data sets are merged with other data or if the data archive requires IRB review:
- Inter-University Consortium for Political and Social Research (ICPSR)
- National Center for Health Statistics
- National Center for Education Statistics
- National Election Studies
- U.S. Bureau of the Census
Additional data sets and archives may quality for inclusion on this list. Investigators who wish to have a specific data set or data archive considered for inclusion on this list should submit the following information to irbsbs@virginia.edu:
- In the email subject, please add the following: IRB-SBS archival data set review
- The name of data set or data archive; and
- The URL for the data set/archive or other specific information on how to obtain the data set; and
An abstract that describes the content and potential uses of the data set/archive.
Private data sets may include (but are not limited to): data collected previously by another researcher for another study, data collected by another agency for evaluative or research purposes, or your own data that you collected for a previous study. Private data sets generally require permission to access the data, and the Board will need to know that you will obtain (or have already obtained) proper permission from the appropriate entity.
Private records are data that were not collected with the intent to conduct research, but instead exists for the purpose of collecting information on individuals for the individual’s own sake. For example, student records, medical records, credit histories, etc, are private records that are maintained by agencies other than the individual but contain personal information about the individual. Some of these records are collected by government agencies and by law are accessible to the public—thus they fall under the publicly-available data sets category. Private records can be governed by privacy laws and regulations, thus requiring special permission to access the records as well as additional safeguards for using the data. Some researchers may have access to private records as part of their professional role; for example you may be able to access student records as a professor but you will still need to obtain permission to access records as a researcher (particularly because these records are also protected by FERPA regulations). These records can still qualify for exemption if the data are received stripped of identifiers.
Student Records and Classroom Data: Please see Education: Student Records.
Medical Records and HIPAA:
The IRB-SBS does not review studies where a medical record is used; these studies are reviewed by the IRB-HSR. If you have any questions regarding which IRB should review your study, check out the HSR/SBS decision algorithm. If this doesn’t answer your question, please contact our office (or the HSR) before completing our protocol form as each IRB has separate submission procedures.
Combining data sets can provide interesting insights into behavior and provide rich information for statistical models. However, combining data can also increase the ability to identify individuals in de-identified data sets. From the OHRP website:
“A subset of “big data research” uses ongoing and constantly replenished and revised data systems, with analysis updated in real time as new information becomes available. In some instances these may be ongoing “longitudinal” studies; may involve Bayesian designs for data collection and analysis; and can involve “adaptive” study designs that change as new information becomes available and is added to the data being analyzed. Increasingly in the social and behavioral research context, longitudinal data systems link multiple ongoing data streams (e.g., student records, employment, social welfare services, health records, police encounters, arrest records), and these study designs can, over time, create risks of re-identification and misuse that are not present in studies using static data sets.”
The IRB regulations require that researcher obtain IRB approval/ exemption prior to collecting any data. The Board cannot retroactively approve the collection of data that falls under our definition of research. However, the regulations recognize that there are instances where data is collected without the intention to conduct human subjects research and this data could prove to be valuable information in a later study. For example, information collected in a pilot study to test the feasibility of conducting a full study may be viable data to include in the full study. A pilot study doesn’t necessarily qualify as “research” according the IRB regulations. The same could be true for a class project where data was collected for a brief paper submitted to a professor, but later provided necessary information for a full dissertation or thesis project. This should not be considered a loophole for avoiding IRB review, however. In order to approve the use of this data, the IRB will review the collection of the data and hold it to the same standards required for any collection of data. If the IRB finds that the data was not collected according to our ethical guidelines and regulations, the Board will not allow that the data be used. For example, if you collect sensitive information that can be linked to an individual but the participant did not consent to the collection of this data, the Board may not approve the use of this data because of the manner in which it was collected. In order to avoid this scenario, we recommend that you contact our office for further guidance regarding data collection. Depending on the project, we may advise that you submit a protocol for a pilot study or class project, which will help you avoid any question about the viability of your data. If you don’t need to submit a protocol at this time, we can provide suggestions and recommendations for collecting your data so that it can be approved at a later date if you decide to use it.
- Describe research involving the secondary research of existing data by creating a Data Source in the Data Source section.
- Upload any additional resources that describe the existing data in the Data Source Upload.
- Upload any files that document permission to access data in the Permissions section.
- If you have more than one Data Source and the sources are linked, the Associate Data Sources with Data Sources is the section where you can demonstrate and describe this relationship.
- The Associate Data Sources with Participant Groups is the section where you can demonstrate the relationship between Participant Groups and Data Sources (if you have more than one of both).