Tutorial: OEP Upload Process for data and metadata
Tutorial - How Upload Your Data and Metadata to the OEPPermalink
license: GNU Affero General Public License Version 3 (AGPL-3.0)
copyright: Reiner Lemoine Institut
authors: christian-rli, jh-RLI, Ludee
If Jupyter Notebooks are new to you and you'd like to get an introduction, have a look at this less than 10 minute introduction video. Official installation instructions are available on jupyter's readthedocs page.
This resource will go through the technical process of uploading data to the OEDB. It uses example data from a csv file and corresponding metadata to illustrate the process. In order to replicate it with your own data in a jupyter notebook, you can use this empty upload template with the same structure.
If you need more context on the used tools and how to install them, have a look at the Upload Process Guidebook.
You need to be signed up to the OEP to access your username and API Token. To run this Jupyter Notebook you need to create an execution environment with all the following dependencies installed:
from oem2orm import oep_oedialect_oem2orm as oem2orm import os import pandas as pd import getpass
Setting up the oem2orm loggerPermalink
If you want to see detailed runtime information on oem2orm functions or if errors occur, you can activate the logger with this simple setup function.
Connection to OEPPermalink
To connect to the OEP you need your OEP Token and user name. Note: You ca view your token on your OEP profile page after logging in. The following command will prompt you for your token and store it as an environment variable. When you paste it here, it will only show dots instead of the actual string.
os.environ["OEP_TOKEN"] = getpass.getpass('Token:')
Provide your OEP-username to oem2orm in order to create a connection to the database. Your token is taken from the environment variable you've created above. Note: Using white space in your name is fine.
db = oem2orm.setup_db_connection()
Creating sql tables from oemetadataPermalink
The oemetadata format is a standardised json file format and required for all data uploaded to the OEP. It includes the data model and the used data types. This allows us to derive the necessary tables in sqlalchemy from it.
In order to create the table(s) we need to tell python where to find our oemetadata file first. To do this we place them in the folder "metadata" which is in the current directory (Path of this jupyter notebbok). Provide the path to your own folder if you want to use your own metadata. oem2orm will process all files that are located in the folder.
metadata_folder = oem2orm.select_oem_dir(oem_folder_name="metadata")
The next command will set up the table. The collect_tables_function collects all metadata files in a folder and retrives the SQLAlchemy ORM objects and returns them. The Tables are ordered by foreign key. Having a valid metadata strings is necessary for the following steps.
tables_orm = oem2orm.collect_tables_from_oem(db, metadata_folder)
Now we can use create our table objects in the database.
#create table oem2orm.create_tables(db, tables_orm)
The tables should now be public, but empty on the OEP at the location provided in the metadata file. For this example tutorial, the created table is located in model_draft.upload_tutorial_example_data. If you've just been playing around and don't want to write any data to the OEP, please make sure to delete your tables again.
Deleting your tablePermalink
Running the following commands will delete the tables from the database which you have in your ordered ORM. Take care that you only delete tables you actually want to have removed.
# In order to actually delete, you will need to type: yes oem2orm.delete_tables(db, tables_orm)
Writing data into a tablePermalink
In this example we will upload data from a csv file. Pandas has a read_csv function which makes importing a csv-file rather comfortable. It reads csv into a DataFrame. By default, it assumes that the fields are comma-separated. Our example file has columns with semicolons as separators, so we have to specify this when reading the file.
The example file for this tutorial ('upload_tutorial_example_data.csv') is in the 'data' directory, next to this tutorial. Make sure to adapt the path to the file you're using if your file is located elsewhere.
# db = oem2orm.setup_db_connection() filepath = "./data/upload_tutorial_example_data.csv" example_df = pd.read_csv(filepath, encoding='utf8', sep=';') # show the first 10 row´s example_df[:10]
We need to define the location in the OEDB where the data should be written to. The connection information is still available from our steps above.
schema = "model_draft" table_name = "upload_tutorial_example_data" connection = db.engine
The following command will write the content of your dataframe to the table on the OEP that was created earlier.
Have a look in the OEP after it ran succesfully!
try: example_df.to_sql(table_name, connection, schema=schema, if_exists='append', index=False) print('Inserted data to ' + schema + '.' + table_name) except Exception as e: print('Writing to ' + table_name + ' failed!') print('Note that you cannot load the same data into the table twice. There will be an id conflict.') print('Delete and recreate with the commands above, if you want to test your upload again.')
Now that we have data in our table it's high time, that we attach our metadata to it. Since we're using the api, some direct http-requests and a little helper function from the oep-client, we need to import these new dependencies.
import json import requests
We use the metadata folder we set up before. (See the Creating tables section) If you wan´t to set another folder use the code below:
# oem_path = oem2orm.select_oem_dir(oem_folder_name="metadata") md_file_name = "example_metadata"
First we're reading the metadata file into a json dictionary.
metadata = oem2orm.mdToDict(oem_folder_path=metadata_folder, file_name=md_file_name)
Then we need to validate the metadata.
Now we can upload the metadata.
If you still have the page on the OEP with your data open, refresh it. It should now show you the metadata on its side.