Tutorial: Checklist for working with open data
All data generated in publicly funded research projects should be available to the public and the modeling community for reuse. This means ideally, all of the following 6 points should be fulfilled:
- All data should be available in machine-readable formats, meaning that they should be structured so that a computer can process the contained information (as opposed to just the letters and the styling). File formats that are generally suitable for this are csv, json, xml and rdf. Databases are suitable as well. Unsuitable file formats include pdf files, word documents and presentations.
- All data is accompanied with metadata.
- Provide detailed and up-to-date documentation. The documentation should include detailed information about measurement and estimation techniques.
- The final data should be published under an open license. To facilitate analysis and reuse, data should be published under an established open license. We recommend a public domain dedication; where this is not possible, an attribution license is the second best choice, such as Creative Commons Attribution 4.0. See also Open Data Licenses
- The data should be published through centralized platforms, which permit download of data in bulk, be version controlled, and be permanently available, preferably through a Digital Object Identifier and provide API web access.
- All uploaded data from third parties should indicate who holds intellectual property rights. It must be transparent to users as to who holds the intellectual property rights.
More detailed information on these topics can be found in Open Data for Electricity Modeling.