The following checklist will help you to prepare your data in a way that you will have less trouble uploading it into the OEP. The main purpose is to have your data machine-readable and simultaneously for humans easy to understand.
- Only work with tables
- If you work with Excel: use only ONE table per excel worksheet (see figure poor example and good example on how to split one table into multiple tables)
- Use English table names
- Use consistent file names. This will facilitate the upload.
a. Good example: energyloadgermanypv, energyloadgermanywind, hence: energyloadgermany*
b. Poor example: energypvload, windenergieDeutschlandversion3
- Try to keep all files in one folder or in folders with consistent names as described in 4. to be easily machine readable
- Follow the conventions for columns:
a. Each column has only one entry (see figure good example)
b. Speaking names (and see figure good example)
i. Good example: engeryloadgermanypv
ii. Poor example: lpvvs4
c. Follow naming conventions e.g. SI units if possible
d. Transpose your file if possible (e.g. in case of years as columns), see figure poor example and good example. A tool helping you with this can be found at: #OpenEnergyPlatform/oeplatform/issues/350
e. Never start column names with a number (will not work during upload), figure good example
f. Use English column names
g. Make sure each column contains only specific datatypes: string (e.g. “this is a string”), integer (e.g. 1,2,-5), float (e.g. 1.544) … see figure good example
- Optional: you might want to change your Excel to use ‘.’ (English) for separation instead of ‘,’ (German) – what is compulsory for the OEP?
- Think what you mean by missing values: if you implicit mean zero enter ‘0’. If you mean there were no data available set NODATA –what is OEP convention? or leave it empty?
- Make sure you have one or more primary keys so that each entry is unique. A primary key is a unique identifier.
a. Example: first name, surname, passportnr
The passportnr is a primariy key, because it is unique and identifies each person
- In case of dates
a. Always use the same format
b. Think about your time zone
c. In case of time intervals, do you have a left stamp, right stamp or middle stamp?
- Implement your model that the result has always the same format: that will make your life easier to work with the OEP but also for your own post-processing
- If you have multiple tables, make sure they can be linked to one another
a. e.g. Person with first and surname and table with cars and car owners surname. And see figure: good example
This is a poor and a good example. This gives an Idea how to solve these problems, there are many good ways to solve it and two possibilities are show here.
In the good example there are two options how to split the timer series.
- This option is used in case of many wind turbine and solar park data:
- This option is used if there are not as many wind turbines and solar parks