Validate Metadata - Datapackage

In [ ]:
__copyright__ = "Reiner Lemoine Institut"
__license__   = "GNU Affero General Public License Version 3 (AGPL-3.0)"
__url__       = "https://www.gnu.org/licenses/agpl-3.0.html"
__author__    = "christian-rli, Ludee"
In [1]:
from datapackage import Package
import pprint as pp


Frictionlessdata offers a python package datapackage-py to work with datapackages and validate the metadata string.

  • Save metadata string as .json file in the same folder folder
  • Load [Package('string')] and validate [dp.valid] metadata string
  • If the validations fails, an error [dp.errors] description is printed with

Datapackage Requirements

Taken from https://frictionlessdata.io/specs/data-package/ and https://frictionlessdata.io/specs/data-resource/.

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

  1. [FILE] A Data Package descriptor MUST be a valid JSON object. (JSON is defined in RFC 4627). When available as a file it MUST be named datapackage.json and it MUST be placed in the top-level directory (relative to any other resources provided as part of the data package).
  2. [resources] The descriptor MUST contain a resources property describing the data resources. The resources property is required, with at least one resource. Packaged data resources are described in the resources property of the package descriptor. This property MUST be an array of objects. Each object MUST follow the Data Resource specification.
  3. [name] A short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ".", "_" or "-" characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).
  4. [licenses] MUST be an array. Each item in the array is a license. Each MUST be an object. The object MUST contain a name property and/or a path property. It MAY contain a title property.
    1. [name]: The name MUST be an Open Definition license ID
    2. [path]: A url-or-path string, that is a fully qualified HTTP address, or a relative POSIX path
  5. [contributors] The people or organizations who contributed to this Data Package. It MUST be an array. Each entry is a Contributor and MUST be an object. A Contributor MUST have a title property and MAY contain path, email, role and organization properties.

OEP metadata v1.4

In [2]:
# oep_metadata_template.json
    dp = Package('oep_metadata_template.json')
    if dp.valid == True:
        print('Metadata is a valid DataPackage!')
    print('No valid JSON file!')
Metadata is a valid DataPackage!
In [3]:
# print JSON
{'_comment': {'dates': 'Dates and time must follow the ISO8601 including time '
                       'zone (YYYY-MM-DD or YYYY-MM-DDThh:mm:ss±hh)',
              'languages': 'Languages must follow the IETF (BCP47) format '
                           '(en-GB, en-US, de-DE)',
              'licenses': 'License name must follow the SPDX License List '
              'metadata': 'Metadata documentation and explanation '
              'none': 'If not applicable use (none)',
              'review': 'Following the OEP Data Review '
              'units': 'Use a space between numbers and units (100 m)'},
 'context': {'contact': '',
             'documentation': '',
             'grantNo': '',
             'homepage': '',
             'sourceCode': ''},
 'contributors': [{'comment': '',
                   'date': '',
                   'email': '',
                   'object': '',
                   'title': ''},
                  {'comment': '',
                   'date': '',
                   'email': '',
                   'object': '',
                   'title': ''}],
 'description': '',
 'id': '',
 'keywords': [''],
 'language': ['en-GB'],
 'licenses': [{'attribution': '© CopyrightOwner',
               'instruction': 'https://tldrlegal.com/license/creative-commons-attribution-4.0-international-(cc-by-4)',
               'name': 'CC-BY-4.0',
               'path': 'https://creativecommons.org/licenses/by/4.0/legalcode',
               'title': 'Creative Commons Attribution 4.0 International'}],
 'metaMetadata': {'metadataLicense': {'name': 'CC0-1.0',
                                      'path': 'https://creativecommons.org/publicdomain/zero/1.0/',
                                      'title': 'Creative Commons Zero v1.0 '
                  'metadataVersion': 'OEP-1.4'},
 'name': 'oep_metadata_table_template_v14',
 'profile': 'data-package',
 'publicationDate': '',
 'resources': [{'dialect': {'caseSensitiveHeader': False,
                            'decimalSeparator': '.',
                            'delimiter': 'none',
                            'doubleQuote': True,
                            'header': True,
                            'lineTerminator': '\r\n',
                            'quoteChar': '"',
                            'skipInitialSpace': True},
                'encoding': 'UTF-8',
                'format': 'PostgreSQL',
                'name': 'model_draft.oep_metadata_table_template_v14',
                'path': 'https://github.com/OpenEnergyPlatform/examples/tree/master/metadata',
                'profile': 'tabular-data-resource',
                'schema': {'fields': [{'description': 'Unique identifier',
                                       'format': 'default',
                                       'name': 'id',
                                       'type': 'serial',
                                       'unit': 'none'},
                                      {'description': 'Reference year',
                                       'format': 'default',
                                       'name': 'year',
                                       'type': 'integer',
                                       'unit': 'none'},
                                      {'description': 'Example value',
                                       'format': 'default',
                                       'name': 'value',
                                       'type': 'double precision',
                                       'unit': 'none'},
                                      {'description': 'Geometry',
                                       'format': 'default',
                                       'name': 'geom',
                                       'type': 'geometry(Point, 4326)',
                                       'unit': 'none'}],
                           'missingValues': [''],
                           'primaryKey': 'id'}}],
 'review': {'badge': 'platin',
            'path': 'https://github.com/OpenEnergyPlatform/data-preprocessing/wiki'},
 'sources': [{'copyright': '',
              'description': '',
              'license': '',
              'path': 'https://',
              'title': ''},
             {'copyright': '',
              'description': '',
              'license': '',
              'path': 'https://',
              'title': ''}],
 'spatial': {'extent': '', 'location': '', 'resolution': ''},
 'temporal': {'end': '', 'referenceDate': '', 'resolution': '', 'start': ''},
 'title': ''}
In [ ]:

If you find bugs or if you have ideas to improve the Open Energy Platform, you are welcome to add your comments to the existing issues on GitHub.
You can also fork the project and get involved.

Please note that the platform is still under construction and therefore the design of this page is still highly volatile!