Deposit data into a University repository

Preparing data for repository deposit

These best-practice methods will help you to prepare your data for deposit in a repository. The guidance below covers:

file formats
file naming
structuring data
metadata.

We also provide a sample dataset to model good practice.

File formats

We recommend using open file formats where possible. See more information on choosing file formats.

File naming

File and folder names must not include any spaces or special characters other than hyphens (-) or underscores (_) which may be used to separate words

Use a consistent naming convention with meaningful and descriptive names and structure to aid navigation. For example, Data and Documentation could be top-level zip folders with sub-folders for each file type:

Data.zip
> Data
>> Data_raw
>> Data_clean
>> Data_metadata

> Images
>> Images_charts
>> Images_maps

> Code
>> Code_Data_Processing_Script.py
>> Code_Visualisation_Script.R

Documentation.zip
> Literature_review.docx
> Methodology
> readme.txt

Include creation or last modification date and version number in files names where needed, for example: Data_collection_methodology_V1.2_2023_06_21

Structuring data

When structuring data:

each top-level folder should be a maximum of 2GB in size
limit the number of top-level folders to a maximum of 12
use zip folders at the top level. Avoid nesting zip folders, this does not improve compression and makes navigation and access harder.

If this structure is not possible for your data, then email us at [email protected] for advice.

Readme template

When you are completing the process to upload data into one of our repositories, we will provide you with a readme file to fill out in step two of the depositing process.

The readme template includes instructions in square brackets. Complete the template according to these instructions then delete them.

Metadata spreadsheet

As part of step two of the depositing process, you’ll be asked to populate a spreadsheet with your metadata.

Follow our guidance to complete the spreadsheet correctly:

Enter the information about your dataset into the ‘Datasets’ tab of the spreadsheet.
The yellow fields are mandatory.
Use a new column where there is more than one entry for a field, for example if you have multiple grant numbers or subjects.
The Subjects field uses a controlled list – see the ‘Subjects’ tab for the options and enter only the code, for example I260. You may choose multiple subjects.
List your top-level folders and files in the ‘Files’ tab. Title and Description are optional but may help differentiate between files with similar names. Where the Title field is left blank, the record will display the file name.
The Content type uses a controlled list: metadata, program or data, most files will be data or documentation.
Enter the full file name including the extension, for example readme.txt or data.zip. Ensure the filename entered matches that of the file and contains no spaces.

Sample dataset

For a good example, see this sample dataset deposited in the Research Data Leeds repository.

The researchers in this example selected the relevant data for their publication and considered how best to organise the data into related sections so that it is easy to display and navigate.

The Readme.txt file explains what files are included in the dataset and how they relate to each other.

The dataset has been assigned a DOI to provide a persistent link and enable formal citation and tracking like other types of scholarly publication.

The data is associated with a published paper in the Journal of the Mechanical Behavior of Biomedical Materials.