Organizing research data

5S methodology

The 5S method for workplace organization was originally developed in the 2000s for production systems and later adapted to the context of research data. It consists of the following components: Sort, Set in Order, Shine, Standardize, and Sustain.

Part 1: Sort

Goal: Save time when searching and free up more storage space
Implementation: Delete unnecessary files and folders (or mark them for deletion) and keep temporary files from taking up space

Part 2: Set in Order

Goal: Establish a system to streamline work processes
Implementation: Create logical folder structures and naming conventions; document structures and exceptions

Part 3: Shine

Goal: Maintain quality, make adjustments as needed, document the process, and ensure it remains clear and understandable
Implementation: Monitor procedures, enforce them personally, and develop regular routines.

Part 4: Standardize

Goal: Establish processes and deadlines and facilitate collaboration
Implementation: Document best practices, guidelines, and rules; define Standard Operating Procedures (SOPs); discuss with colleagues and clarify responsibilities

Part 5: Sustain

Goal: Maintain the system developed through self-discipline and habit; strive for automation and the use of templates
Implementation: Workshops, training sessions; onboarding new employees; and adopting new methods and technologies

Source: Lang, K., Roman Gerlach, Jessica Rex, Annett Schröter, & Nadine Neute.(2025, April 30). Coffee Lecture Slides: 5S Data. Zenodo. doi:10.5281/zenodo.15310362

Please note: Once you watch the video, data will be transmitted to Youtube/Google. For more information, see Google Privacy.

Source: 5S Methodology - Definition, Method, Benefits Explained (Lean Manufacturing Tools), Academic Gain Tutorials, 2024

Folder structures and file naming conventions

Effective and secure data management requires the creation and use of clear structures for organizing your data. Effective data organization involves systematic folder structures that establish a consistent naming convention for folders and files and select appropriate storage locations.

Meaningful file name components include, for example:

Title, describing the content
Initials of the author
Date in YYMMDD format
Version number (e.g., “v02”)

Example of a file name: <YYMMDD_Title_Measurement_Series_Author_Version>

Additional guidelines for naming conventions and folder structures:

Avoid spaces; use “-” or “_” instead
Avoid special characters, e.g., & * % €? !
Avoid “umlauts”
Avoid generic names, e.g., “Record,” “Text”
Avoid long names
Avoid too many levels and parallel folders

Versioning

Consistent versioning of research data, including its documentation, is essential for tracking and distinguishing between different stages of processing. It is particularly helpful for tracking changes in data and documents at key milestones.

Options for versioning:

in the file name, e.g., by appending “v01” or “final” to the end of the file name
in the file itself, e.g., by inserting a revision history at the beginning of the text document. Not all file formats allow for the addition of such a text section.
in a separate versioning document, i.e., a separate document that lists the information, changes, and, if applicable, the editors and date of the last change, etc. Such a document can also serve, for example, as a supplement to versioning in the file name to document which changes were made between two versions.
Version control software, e.g., using Git. Git originated in software development and is particularly well-suited for managing text-based files. Further information can be found on the University of Rostock’s GitLab service.
Versioning and change tracking are available for collaborative documents and storage locations; for example, versioning of documents can be enabled in SharePoint.

Preferred formats for long-term archiving

The reuse of research data requires that the relevant file format be compatible with existing tools. In particular, proprietary—i.e., manufacturer-specific, unpublished—file formats can pose challenges for reuse after many years, as users may need to purchase the often costly software or, in the worst case, reverse-engineer the format in order to reuse the research data. Therefore, it is recommended to ideally choose an open file format that can be used long-term. In some cases, it is possible to export or convert the data to an open format within the respective software, though it is important to verify whether any information is lost in the process. If this is not lossless, it is still recommended to store the data additionally in an open format, accompanied by documentation detailing any associated limitations, to ensure flexibility in reuse. The following table contains recommendations for common data types:

Data type	Recommended	Avoid
Table	CSV, TSV, SPSS portable, ODS, XLSX	XLS, SPSS, NUMBERS
Text	TXT, HTML, RFT, PDF/A, DOCX, ODT	DOC, PDF, PAGES
Multimedia	Container: MP4, MKV, Codec: Theora, Dirac, FLAC, MPEG4	QuickTime, Flash
Picture	TIFF, JPEG2000, PNG, JPG	GIF, RAW, NEF, PSD, VSD

Backup Strategies

3-2-1 Backup

You should create three copies of your data, stored on two different types of media, and also back up one copy off-site.

Example: Original data: Stored safely on your laptop; First backup: A copy on an external hard drive; Second backup: An additional copy in the cloud.

Advantages:

Data security and minimization of data loss
Protection against physical hazards
Simplicity and flexibility

Tips for secure backups:

Backup storage devices should be physically separated from the infrastructure in use
Recommendation: Back up at least once a day + perform a full backup weekly
Data recovery should be tested at the start of the project and at regular intervals

Project drive (ITMZ)

Central home directory (ITMZ)

Backup (ITMZ)

Templates for Writing Research Proposals

You can find text templates for describing the technical infrastructure on the page “Guidelines for DFG Applications.”

At a glance:

Service: