OEP_Upload_Process_Data_and_Metadata

OpenEnergy Platform

OpenEnergyPlatform



Tutorial - How Upload Your Data and Metadata to the OEP

The development of tutorials for the Open Energy Family takes place publicly in a dedicated tutorial repository.
Please report bugs and suggestions as new issues.

license: GNU Affero General Public License Version 3 (AGPL-3.0)
copyright: Reiner Lemoine Institut
authors: christian-rli, jh-RLI, Ludee

If Jupyter Notebooks are new to you and you'd like to get an introduction, have a look at this less than 10 minute introduction video. Official installation instructions are available on jupyter's readthedocs page.

Introduction

This resource will go through the technical process of uploading data to the OEDB. It uses example data from a csv file and corresponding metadata to illustrate the process. In order to replicate it with your own data in a jupyter notebook, you can use this empty upload template with the same structure.

If you need more context on the used tools and how to install them, have a look at the Upload Process Guidebook.

Setup

You need to be signed up to the OEP to access your username and API Token. To run this Jupyter Notebook you need to create an execution environment with all the following dependencies installed: oem2orm, pandas, requests, oep_client, oedialect.

Uploading Process

Import Dependencies

We will start out by creating a connection to the OEP, reading in our metadata files and creating empty tables from these. For these steps we need to import oem2orm, pandas, os and getpass.

In [ ]:
from oem2orm import oep_oedialect_oem2orm as oem2orm
import os
import pandas as pd
import getpass

Setting up the oem2orm logger

If you want to see detailed runtime information on oem2orm functions or if errors occur, you can activate the logger with this simple setup function.

In [ ]:
oem2orm.setup_logger()

Connection to OEP

To connect to the OEP you need your OEP Token and user name. Note: You can view your token on your OEP profile page after logging in. The following command will prompt you for your token and store it as an environment variable. When you paste it here, it will only show dots instead of the actual string.

In [ ]:
os.environ["OEP_TOKEN"] = getpass.getpass('Token:')

Provide your OEP-username to oem2orm in order to create a connection to the database. Your token is taken from the environment variable you've created above. Note: Using white space in your name is fine.

In [ ]:
db = oem2orm.setup_db_connection()

Creating sql tables from oemetadata

The oemetadata format is a standardised json file format and required for all data uploaded to the OEP. It includes the data model and the used data types. This allows us to derive the necessary tables in sqlalchemy from it.

In order to create the table(s) we need to tell python where to find our oemetadata file first. To do this we place them in the folder "metadata" which is in the current directory (Path of this jupyter notebbok). Provide the path to your own folder if you want to use your own metadata. oem2orm will process all files that are located in the folder.

In [ ]:
metadata_folder = oem2orm.select_oem_dir(oem_folder_name="metadata")

The next command will set up the table. The collect_tables_function collects all metadata files in a folder and retrives the SQLAlchemy ORM objects and returns them. The Tables are ordered by foreign key. Having a valid metadata strings is necessary for the following steps.

INFO: The red output is information printed by the logger. It does not mean that an error has occurred.
In [ ]:
tables_orm = oem2orm.collect_tables_from_oem(db, metadata_folder)

Now we can use create our table objects in the database.

In [ ]:
#create table
oem2orm.create_tables(db, tables_orm)

The tables should now be public, but empty on the OEP at the location provided in the metadata file. For this example tutorial, the created table is located in model_draft.upload_tutorial_example_data. If you've just been playing around and don't want to write any data to the OEP, please make sure to delete your tables again.

Deleting your table

Running the following commands will delete the tables from the database which you have in your ordered ORM. Take care that you only delete tables you actually want to have removed.

Skip this command, if you want to keep your table!

In [ ]:
# In order to actually delete, you will need to type: yes
oem2orm.delete_tables(db, tables_orm)

Writing data into a table

In this example we will upload data from a csv file. Pandas has a read_csv function which makes importing a csv-file rather comfortable. It reads csv into a DataFrame. By default, it assumes that the fields are comma-separated. Our example file has columns with semicolons as separators, so we have to specify this when reading the file.

The example file for this tutorial ('upload_tutorial_example_data.csv') is in the 'data' directory, next to this tutorial. Make sure to adapt the path to the file you're using if your file is located elsewhere.

In [ ]:
# db = oem2orm.setup_db_connection()

filepath = "./data/upload_tutorial_example_data.csv"
example_df = pd.read_csv(filepath, encoding='utf8', sep=';')

# show the first 10 row┬┤s
example_df[:10]

We need to define the location in the OEDB where the data should be written to. The connection information is still available from our steps above.

In [ ]:
schema = "model_draft"
table_name = "upload_tutorial_example_data"
connection = db.engine

The following command will write the content of your dataframe to the table on the OEP that was created earlier.
Have a look in the OEP after it ran succesfully!

In [ ]:
try: 
    example_df.to_sql(table_name, connection, schema=schema, if_exists='append', index=False)
    
    print('Inserted data to ' + schema + '.' + table_name)
except Exception as e:
    # session.rollback()
    print('Writing to ' + table_name + ' failed!')
    print('Note that you cannot load the same data into the table twice. There will be an id conflict.')
    print('Delete and recreate with the commands above, if you want to test your upload again.')

Writing metadata to the table

Now that we have data in our table it's high time, that we attach our metadata to it. Since we're using the api, some direct http-requests and a little helper function from the oep-client, we need to import these new dependencies.

In [ ]:
import json
import requests
from oep_client import OepClient

The token needs to be passed from the environment variable to the oep-client.

In [ ]:
cl = OepClient(token=os.environ["OEP_TOKEN"])

First we're reading the metadata file into a json dictionary.

In [ ]:
with open('metadata/example_metadata.json') as json_file:
    data = json.load(json_file)
In [ ]:
cl.update_metadata(metadata=data)

If you still have the page on the OEP with your data open, refresh it. It should now show you the metadata on its side.

If you find bugs or if you have ideas to improve the Open Energy Platform, you are welcome to add your comments to the existing issues on GitHub.
You can also fork the project and get involved.

Please note that the platform is still under construction and therefore the design of this page is still highly volatile!