Organizing research data
5S methodology
The 5S method for workplace organization was originally developed in the 2000s for production systems and later adapted to the context of research data. It consists of the following components: Sort, Set in Order, Shine, Standardize, and Sustain.
Part 1: Sort
Goal: Save time when searching and free up more storage space
Implementation: Delete unnecessary files and folders (or mark them for deletion) and keep temporary files from taking up space
Part 2: Set in Order
Goal: Establish a system to streamline work processes
Implementation: Create logical folder structures and naming conventions; document structures and exceptions
Part 3: Shine
Goal: Maintain quality, make adjustments as needed, document the process, and ensure it remains clear and understandable
Implementation: Monitor procedures, enforce them personally, and develop regular routines.
Part 4: Standardize
Goal: Establish processes and deadlines and facilitate collaboration
Implementation: Document best practices, guidelines, and rules; define Standard Operating Procedures (SOPs); discuss with colleagues and clarify responsibilities
Part 5: Sustain
Goal: Maintain the system developed through self-discipline and habit; strive for automation and the use of templates
Implementation: Workshops, training sessions; onboarding new employees; and adopting new methods and technologies
Source: Lang, K., Roman Gerlach, Jessica Rex, Annett Schröter, & Nadine Neute.(2025, April 30). Coffee Lecture Slides: 5S Data. Zenodo. doi:10.5281/zenodo.15310362
Please note: Once you watch the video, data will be transmitted to Youtube/Google. For more information, see Google Privacy.
Source: 5S Methodology - Definition, Method, Benefits Explained (Lean Manufacturing Tools), Academic Gain Tutorials, 2024
Folder structures and file naming conventions
Effective and secure data management requires the creation and use of clear structures for organizing your data. Effective data organization involves systematic folder structures that establish a consistent naming convention for folders and files and select appropriate storage locations.
Meaningful file name components include, for example:
- Title, describing the content
- Initials of the author
- Date in YYMMDD format
- Version number (e.g., “v02”)
Example of a file name: <YYMMDD_Title_Measurement_Series_Author_Version>
Additional guidelines for naming conventions and folder structures:
- Avoid spaces; use “-” or “_” instead
- Avoid special characters, e.g., & * % €? !
- Avoid “umlauts”
- Avoid generic names, e.g., “Record,” “Text”
- Avoid long names
- Avoid too many levels and parallel folders
Versioning
Consistent versioning of research data, including its documentation, is essential for tracking and distinguishing between different stages of processing. It is particularly helpful for tracking changes in data and documents at key milestones.
Options for versioning:
- in the file name, e.g., by appending “v01” or “final” to the end of the file name
- in the file itself, e.g., by inserting a revision history at the beginning of the text document. Not all file formats allow for the addition of such a text section.
- in a separate versioning document, i.e., a separate document that lists the information, changes, and, if applicable, the editors and date of the last change, etc. Such a document can also serve, for example, as a supplement to versioning in the file name to document which changes were made between two versions.
- Version control software, e.g., using Git. Git originated in software development and is particularly well-suited for managing text-based files. Further information can be found on the University of Rostock’s GitLab service.
- Versioning and change tracking are available for collaborative documents and storage locations; for example, versioning of documents can be enabled in SharePoint.
Preferred formats for long-term archiving
The reuse of research data requires that the relevant file format be compatible with existing tools. In particular, proprietary—i.e., manufacturer-specific, unpublished—file formats can pose challenges for reuse after many years, as users may need to purchase the often costly software or, in the worst case, reverse-engineer the format in order to reuse the research data. Therefore, it is recommended to ideally choose an open file format that can be used long-term. In some cases, it is possible to export or convert the data to an open format within the respective software, though it is important to verify whether any information is lost in the process. If this is not lossless, it is still recommended to store the data additionally in an open format, accompanied by documentation detailing any associated limitations, to ensure flexibility in reuse. The following table contains recommendations for common data types:
| Data type | Recommended | Avoid |
|---|---|---|
| Table | CSV, TSV, SPSS portable, ODS, XLSX | XLS, SPSS, NUMBERS |
| Text | TXT, HTML, RFT, PDF/A, DOCX, ODT | DOC, PDF, PAGES |
| Multimedia | Container: MP4, MKV, Codec: Theora, Dirac, FLAC, MPEG4 | QuickTime, Flash |
| Picture | TIFF, JPEG2000, PNG, JPG | GIF, RAW, NEF, PSD, VSD |
Backup Strategies
3-2-1 Backup
You should create three copies of your data, stored on two different types of media, and also back up one copy off-site.
Example: Original data: Stored safely on your laptop; First backup: A copy on an external hard drive; Second backup: An additional copy in the cloud.
Advantages:
- Data security and minimization of data loss
- Protection against physical hazards
- Simplicity and flexibility
Tips for secure backups:
- Backup storage devices should be physically separated from the infrastructure in use
- Recommendation: Back up at least once a day + perform a full backup weekly
- Data recovery should be tested at the start of the project and at regular intervals

