New research data policy

Research Data Management and Access Policy

In 2013, the Office of Science and Technology Policy issued a memorandum requiring federal agencies with more than $100M in research funding each year put in place plans to ensure public access to the “direct results of federally funded scientific research”. The memo directed that these results included both peer-reviewed publications and digital data. Those agencies, many benefactors, and journals now expect publicly accessible research data products to be an output from research grants. One example of this expectation is the National Institutes of Health implementing its own data sharing policy in January 2023. More recently, the current administration issued a memo in August reiterating these requirements.

Arizona State University is committed to ensuring that we develop a culture of open science and scholarship. Consequently, we believe now is the time to implement an explicit data management and access policy that details how we should manage, publish, and preserve the data for which we are the stewards.

ASU personnel, in collaboration with colleagues from NAU an UA, have prepared a draft policy document. We wish to collect feedback on this policy from faculty and staff conducting research at the university. There is a four-week comment period during which all faculty and research personnel are encouraged to review the draft policy and comment using the survey link. These comments will be reviewed and, where possible, thoughtfully considered. Our goal is to approve and implement this policy by 1/1/2023.

Feedback on the draft policy can be submitted here. The feedback form breaks the policy into its constituent parts and you are welcome to comment on any or all parts of the policy.

The full policy in its draft form is available here if you wish to review alongside the feedback form.

The definitions used in the policy are available here for your reference.

Comments received

Below are the comments received from faculty, service providers and academic units. These will be reviewed and addressed, wherever possible, either in the policy or within the processes and infrastructure implemented to support the policy.

Policy preamble

Tracking compliance and and enfocement of the policy are quite different. What is done when there is not compliance?

this is all very bureaucratic and legalistic, which is a start. But what does it mean in practice?

Funding should be made available to assist in offsetting Open access costs.

Often these types of policies are devised when thinking solely of quantitative data. My concern is that often policies that are crafted do not recognize the sensitive nature of qualitative data and the ease with which respondents (for example, in interview data) can be easily identified if this data is collected. Often these policies seem to exist without any recognition of the IRB process.

The lawyers have overdone themselves. This section is obvious, and does not need to be in the “plan.” It is an implementation procedure and not part of the data retention plan.

I collect data for “a project” but often go back and reanalyzed the same data set using a new theoretical framework or with a different focus. Sometimes I work with data older than 10 years. While there is a chance that I do such reanalysis, I do not want to make the data public because someone else can do a similar analysis and publish using my data. Under this policy, at what point do I have to make the data publically available? When the data collection completes? When I publish the first article based on data? When I am all done with the data (i.e., finish publishing all I think i can publish based on data, which may be never)?

Policy ownership and application

1. the steward monitors compliance, but what about enforcement. Where does that lie?

Periodic review of this policy is appropriate. Exceptions for student work in credit-bearing coursework is appropriate, but it should include work that satisfies any program requirement. Example: students who take multiple semesters to complete a dissertation often begin their projects in semesters when they are taking dissertation hours (excepted in this draft), but finish it after they have fulfilled their dissertation hours requirement and are taking continuing-registration hours (not clearly covered by this exception), and work on their dissertations in summer, when they do not need to be registered — and summers are not apparently included in this exception. Would the student have the exception in fall and spring semesters when they are taking dissertation hours, but not in summers, or in semesters when they take continuing-registration credits? Please just provide a broader exception tied to either credit-bearing coursework or internships OR other program requirements.

Clarify whether dissertations and thesis are covered under this policy.

Again, this should be in a separate plan and not part of the data retention plan. The actual data retention plan should be only a 1-2 page document. Otherwise, it probably will not be read.

The dfinition of who is covered seems overly broad. For example, are emeritus faculty covered by these policies. Given the overly broad but vague definition of who is coverd, it woudl seem so.

Data stewardship and governance

1. Principal Investigator is well defined for many federal grants but may not be in other cases. 3. See comments above. The institution has the responsibility to ensure that services exist that make compliance with the policy demands.practical for the research envisioned– except perhaps for extraordinary cases with extreme data management demands that can and would be funded by the sponsor.

There needs to be space to allow PIs working with Tribal communities to allow them to retain IP ownership and data management. Otherwise, ASU will severely restrict its partnerships with communities and also dissolve any trust relations with communities.

In my experience, it is a lot to expect a faculty person who is a PI on a grant to also act as the data manager, especially when there is no support in many departments for the web interface that might make the data available more widely. I do not have a degree in web design, and yet I have managed my own website for more than a decade. It is not a great website, to say the least.

this sounds good, but what does it mean. If I have NSF-funded data from a previous set of grants that are no longer funded, where is the repository in which I can deposit them — other than my own computer.

The PI assumes all of the responsibility but then has no ownership if they decide to leave – it is weighted against the faculty member. The faculty should either have greater control or there should be other mechanisms in place – such as the ADR of each college having implementation responsibility.

Generally, the data belongs to the contract/grant under which it was obtained. If ABOR wants to retain the data at the institutional level, it should also provide resources required for “open access” publications, which are often required by the contract, especially since many organizations no longer support publication fees in the contract. Another problem is the data often does contain the understanding. To fully provide information on the intent and meaning of the data is an additional burden if this requires more work than the publication.

Under Institution Responsibilities: The university should have an obligation to make the necessary resources available for compliance with this policy, i.e., Item 3a(3) should read “Provide the Institutional resources necessary for complying with elements of this policy, including …, and communicate their availability to Researchers.”

Perhaps you will cover this in coming pages, but it would be useful to say something about data resulting from research not funded by grants, when the term Principle Investigator is not relevant. For example, PhD theses, or undergraduate honor’s theses. The university usually invests significantly in these data, either or both in terms of facilities, equipment, and advisor’s salaries. Therefore, data collected should be considered university property, not student property, or at least, co-owned with the student. The student and advisor should jointly have the responsibility to ensure that the data are properly stored, annotated, protected and available.

Data management practices

The current practices of managing data in our project do not always ensure confidentiality of those who participate in the research. We save research data in Google Drive and it is shared with different people both inside and outside the project. Although the project manager and PIs may have tried to implement good data management practices, those who have access to data do not always respect the IRB provisions in terms of data management. Also, Google Drive does not have the necessary function to ensure the protection of sensitive data. The prior affiliations I worked for have stricter data management policies.

Excellent.

The section on Data Use Agreements, and the definition of the term, does not address the situations where researchers must sign an agreement to gain access — are these data use agreements that individual researchers cannot make? Two categories suffice: 1) restricted-data agreements, such as an economist gaining nominal employee status at a Fed to gain access to data, admission to a Federal Statistical Research Data Center, or access to restricted-data sets such as those made available by the U.S. Department of Education; 2) archives, such as a university special collections department. In both cases, I am concerned that draft policy would severely hamper research activities. What would be the approval steps required for individual researchers to make such access arrangements? Would the ASU Office of General Counsel need to bless archival agreements, or insist that archives change their agreements? That is going to be a nonstarter. To provide an example of the complications involved: in most cases, archival entry agreements point out that the control and ownership of materials in an archive is not guaranteed to be the archive itself, and that researchers have the burden of gaining permission from the rights-holder for more than fair use of the materials. At the same time, modern archival practices allow researchers to use (non-flash) photography for individual research use. In my current major project, I have more than 4000 PDFs of archival material in my Zotero database. From an archives standpoint, this is a reasonable practice for me as an individual researcher, and providing me the ability to scan documents with my phone’s camera is not license for me to publish those PDFs without the permission of the rights-holder for each individual item. But from the standpoint of this draft policy, it looks like (a) I would not have the ability to sign an ordinary archives-entry agreement without running it up an administrative chain; and (b) I would somehow be required to publish my database with its 4000-plus PDFs. This is an unreasonable burden for archival research.

What is the “approval” process for a “university-approved storage medium”? This should be outlined first because a non-inclusive approval process can undercut and undermine community standards and trust.

Who will assist the faculty in developing these materials? How will time be allocated to ensure that faculty can do this – especially untenured faculy?

Please think about qualitative researchers.

Sensitive data is similar to classified data. While limitations are necessary in each case, the plan has to sensitive to the need for the student to have de-classified versions of the data for use in theses and dissertations. These should not be considered as sensitive or classified.

The criteria are subjective,and they are to be interpreted by “administrators”. In case of disputes between administrators and researchers, there are no mechanisms for dispute resolution and researchers appear to be at the mercy of administrative decsions.

Data access and availability

I suggest including Github as a source for public access compliance. Many researchers use it and I do not believe the URLs change unless the researchers changes it themselves.

1.a.(3) Should be explicit about insuring reasonable discoverability of the data. 2c and 3 may be in conflict It is the longer that should be the minimum. 4. I agree with what is stated in #4–it is what is not stated that is the problem. Yes people should budget for data management. The funding issues are inadequately dealt with here and in the rest of the policy. The policy assumes that there exists some infrastructure that can satisfy the demands of the policy usually an institutional or disciplinary repository–but what if that does not exist? One cannot realistically budget to build a research data management infrastructure where none exists. It seems to me that there is some institutional responsibility (that is not mentioned anywhere) to provide services (that may have a cost) to satisfy the demands of the policy. As I recall, unfunded research is also covered by the policy, but in that case there is no budget. What does the researcher do? Again, is there not some institutional responsibility here. If the institution does not ensure that there are services available at some sort pf practical cost to make compliance possible, the policy will be ignored, as the federal rules are frequently ignored now. I note also that the policy’s provision for enforcement is almost completely lacking and without enforcement it will have little effect.

The section on Data Access and Availability, and the definition of Research Data, do not make clear what interim work products are included. Most research includes a variety of analyses that never make it to publication, because they are not noteworthy, because there are errors in the analysis, or because there are subsequent better choices made in analysis. Read too broadly, this draft section and the definition of Research Data could require retention, organization, and publication of every research dead-end that happens on a project, including (for example) versions of statistical coding whose primary trait is that they have syntactical errors. Requiring that researchers preserve the complete version history of quantitative statistical program files, or qualitative coding, or even badly-formatted figures, would be an unreasonable burden that does not meet the spirit of the new federal rule or the appropriate intent of transparency in scholarship. There needs to be a reasonable exception for interim work products that are not part of final publications.

See previous comments about managing website where public info can be made available.

I was confused about the line in paragraph (2) that begins “The requirements from A) will extend to. . . ” There are a number of paragraphs marked A in this document, but the only capital-A paragraph seems to be the one titled “Policy Owner,” on p. 1 of the document. But that’s about “ownership” of this policy, so I don’t see how it applies to the “public access to research data” section of the document. Maybe the “A” was supposed to read “1” — i.e., the paragraph immediately above? If so, does “any research data produced under the auspices of ABOR universities” mean that even datasets a faculty member generates in unfunded research must be “made available” — even if the only support given by the university is one’s regular salary and office space, computer, etc.? As I said, I find this paragraph confusing.

do we plan to have a data repository – in the library? Or???

Who will the University assign to manage and ensure this process is occurring? How will faculty be supported in doing this? It is adding multiple steps to the research process. How will data that needs to remain classified/not available – such as qualitative recordings – be exempt? What will be done to support/host the data on University servers? What about data that is required to be submitted elsewhere

I would appreciate if the University would consider working together with common publishing houses and working out deals for better open access rates similar to the UC system. This would allow for easier sharing of results particularly for early career faculty with pilot funding that covers the cost of the project, but not the publication expenses.

The definition and nature of “final research data” in 1.a.1 needs to be clarified. (“making available the final Research Data no later than 2 years after the end of the funded period”). In many fields, a research project has a sequence of steps, and the “final data” may require analyses done under several grants. It would be impossible to post “final data” within 2 years of the initial grant. The draft policy may lead to the posting of premature, incomplete, and perhaps erroneous data in order to comply with the requirement. Here is an example from my NSF-funded archaeological research. The first grant typically will fund fieldwork. The artifacts and materials recovered in that fieldwork will require analyses that may take many years to complete. Sometimes there is a second grant for analysis, but sometimes one has to spend more time because adequate funding cannot be obtained to classify and analyze all the artifacts, making it a drawn-out process using meager funds (it is easier to get funding for fieldwork than for artifact analysis). Even if one could define a set of data pertaining only to the first of a sequence of grants, and arrange to post those data in a timely fashion, the situation is complicated. The full results and implications of the initial fieldwork data are not known until the subsequent steps are carried out. In archaeology, these post-fieldwork activities, whether funded by one or more new grants or not, often take many years to complete. By posting incompletely analyzed data, as required by the draft policy, those data will later be superseded and improved. Can they be removed when they are superseded? Or will problematic premature data be posted in perpetuity because of this policy? Therefore, the concept of “final research data” needs to be clarified to account for multi-stage (and often multi-grant) projects. This is common in fields where primary field data are gathered, which requires more time in analysis.

Please discuss with IRB folks to make sure this will work for qualitative scholars.

1.(3) Most journals accept the statement “data is available from the authors upon request.” If ASU is to begin saving the data (and its interpretation) centrally, this is a big burden upon the research team. Normally, data is retained on local hard drives which can keep it for quite some time. An example of the problem of central storage is the university faculty web pages. In just the last decade plus, support for web pages has changed at least three times. My pages would have disappeared if no fuss was made. Still, they have had to be regenerated as this has happened. Cost means throwing away data!

University is unilaterally imposing costs on researchers. This would be OK if the costs were covered by research grants, but they are not in many cases. The university should be willing to provide professional assistance to those researchers whose grants do not cover data retention costs or those who have unfunded projects.

Public Access or public data accessibility is essential to ASU’s realizing its mission as a public enterprise and to achieving community embeddedness and DEI.

Data use and reuse

This grants too much agency to the institution to set these rules.

This is a good idea, that will likely be of benefit to student RAs.

I do worry that there will be some discrepancies between what is required here and what IRB is going to be asking of us. I am usually asked to keep all transcripts of interviews in locked cabinets, etc. and that there is going to be a disconnect between these policies and IRB.

Can this policy include a link to policies about sensitive and regulated data? Consent forms used decades ago did not specify anything about data sharing. How can this be handled?

1) It seems as though it does not need to be at the level of the Vice President of Research 2) This policy needs an appeal process – the VP Of Research may seek to hinder or harm an employee who is leaving under less than ideal circumstances. How will this policy not be abused to require an employee not to leave under the threat of data not being allowed to be taken? 3) Similarly, the policy around students should be modified to include an appeal process – especially if the student is leaving due to harassment of the supervisor. Overall, I have concern the policy can be used to force individuals to remain or harm untenured faculty members and students.

What happens in cases of retirement? Who are acceptable stewards in cases where a PI leaves and does not wish to take ownership of data? What happens in cross-university collaborations?

Item 4: If a student leaves, often the data and interpretations are required to be retained under the grant, which means the student cannot take the actual data, but only a copy. On the other, computer code which has been modified by the student might be construed to be his/her personal intellectual property. (There have been lawsuits at ASU over the ownership of the computer code.)

This opens possibilities for administration to impede researchers’ freedom to take a position at another university or other meployer. It seems to provide no safeguards for individuals and assigns most decisionmaking rights to the university.

Data management plans

Good

There needs to be flexibility to allow ASU PIs working with Tribal communities to store/archive/collectivize data in Tribally-led data repositories and storage structures.

The Universities should develop standard DMP that can be adapted based on need.

There was a bit in the document (not sure which of these pages) about getting institutional permission to share the data outside (or inside) ASU. This will kill collaboration with researchers at other universities. Such collaboration often means passing data back and forth in a timely manner, and not having long delays for institutional permission each time data is shared. The plan needs a better approach.

Personally, I believe that DMPs should be required for PhD theses, and arguably honor’s theses. This would ensure that research data in which the university is invested is conserved.