df_normalize_denormalize

OpenEnergy Platform

OpenEnergyPlatform



Normalizing / Denormalizing

This tutorial explains how to transform a custom number of columns of a table into rows (Normalizing) and vice versa (Denormalizing). Please report bugs and improvements here: https://github.com/OpenEnergyPlatform/examples/issues

In [1]:
__copyright__ = "Reiner Lemoine Institut"
__license__   = "GNU Affero General Public License Version 3 (AGPL-3.0)"
__url__       = "https://github.com/openego/data_processing/blob/master/LICENSE"
__author__    = "oakca, Ludee"
In [2]:
import pandas as pd
In [3]:
# open df.xlsx
with pd.ExcelFile('df.xlsx') as xls:

    # save Sheet1 in df.xlsx as df
    df = xls.parse('Sheet1')

# show df
df
Out[3]:
Site Process inst-cap cap-lo cap-up inv-cost fix-cost var-cost wacc depreciation
0 Mid Coal plant 0 0 0 600000 0 0.6 0.07 40
1 Mid Lignite plant 0 0 60000 600000 0 0.6 0.07 40
2 Mid Gas plant 0 0 80000 450000 0 1.6 0.07 30
3 Mid Biomass plant 0 0 5000 875000 0 1.4 0.07 25
4 Mid Wind plant 0 0 13000 1500000 0 0.0 0.07 25
5 Mid Solar plant 0 0 160000 600000 0 0.0 0.07 25
6 Mid Hydro plant 0 0 1400 1600000 0 0.0 0.07 50
7 South Coal plant 0 0 100000 600000 0 0.6 0.07 40
8 South Lignite plant 0 0 0 600000 0 0.6 0.07 40
9 South Gas plant 0 0 100000 450000 0 1.6 0.07 30
10 South Biomass plant 0 0 0 875000 0 1.4 0.07 25
11 South Wind plant 0 0 200000 1500000 0 0.0 0.07 25
12 South Solar plant 0 0 600000 600000 0 0.0 0.07 25
13 South Hydro plant 0 0 0 1600000 0 0.0 0.07 50
14 North Coal plant 0 0 100000 600000 0 0.6 0.07 40
15 North Lignite plant 0 0 0 600000 0 0.6 0.07 40
16 North Gas plant 0 0 100000 450000 0 1.6 0.07 30
17 North Biomass plant 0 0 6000 875000 0 1.4 0.07 25
18 North Wind plant 0 0 60000 1500000 0 0.0 0.07 25
19 North Solar plant 0 0 3000 600000 0 0.0 0.07 25
20 North Hydro plant 0 0 20000 1600000 0 0.0 0.07 50

Normalizing

In [4]:
# Normalizing the df with melt (we keep the two columns 'Site' and 'Process'), we also add column unit, sort the values 
# and make sure no old indexes are used as columns.
# As a result we have the two kept columns 'Site' and 'Process' and all other columns are transposed into rows 
# labeled as variable and the respective values are found in value. The newly added column unit contains the respective units
norm = df.melt(['Site', 'Process']).assign(unit='').sort_values(['Site','Process']).reset_index(drop=True)

# assign values for unit
unit = {'inst-cap': 'MW', 'cap-lo': 'MW', 'cap-up': 'MW',
        'inv-cost': '€/MW', 'fix-cost': '€/MW/a', 'var-cost': '€/MWh',
        'wacc': None, 'depreciation': 'a'}

# include units in table
norm['unit'] = norm['variable'].map(unit)

# show normalized df
norm
Out[4]:
Site Process variable value unit
0 Mid Biomass plant inst-cap 0.00 MW
1 Mid Biomass plant cap-lo 0.00 MW
2 Mid Biomass plant cap-up 5000.00 MW
3 Mid Biomass plant inv-cost 875000.00 €/MW
4 Mid Biomass plant fix-cost 0.00 €/MW/a
5 Mid Biomass plant var-cost 1.40 €/MWh
6 Mid Biomass plant wacc 0.07 None
7 Mid Biomass plant depreciation 25.00 a
8 Mid Coal plant inst-cap 0.00 MW
9 Mid Coal plant cap-lo 0.00 MW
10 Mid Coal plant cap-up 0.00 MW
11 Mid Coal plant inv-cost 600000.00 €/MW
12 Mid Coal plant fix-cost 0.00 €/MW/a
13 Mid Coal plant var-cost 0.60 €/MWh
14 Mid Coal plant wacc 0.07 None
15 Mid Coal plant depreciation 40.00 a
16 Mid Gas plant inst-cap 0.00 MW
17 Mid Gas plant cap-lo 0.00 MW
18 Mid Gas plant cap-up 80000.00 MW
19 Mid Gas plant inv-cost 450000.00 €/MW
20 Mid Gas plant fix-cost 0.00 €/MW/a
21 Mid Gas plant var-cost 1.60 €/MWh
22 Mid Gas plant wacc 0.07 None
23 Mid Gas plant depreciation 30.00 a
24 Mid Hydro plant inst-cap 0.00 MW
25 Mid Hydro plant cap-lo 0.00 MW
26 Mid Hydro plant cap-up 1400.00 MW
27 Mid Hydro plant inv-cost 1600000.00 €/MW
28 Mid Hydro plant fix-cost 0.00 €/MW/a
29 Mid Hydro plant var-cost 0.00 €/MWh
... ... ... ... ... ...
138 South Hydro plant cap-up 0.00 MW
139 South Hydro plant inv-cost 1600000.00 €/MW
140 South Hydro plant fix-cost 0.00 €/MW/a
141 South Hydro plant var-cost 0.00 €/MWh
142 South Hydro plant wacc 0.07 None
143 South Hydro plant depreciation 50.00 a
144 South Lignite plant inst-cap 0.00 MW
145 South Lignite plant cap-lo 0.00 MW
146 South Lignite plant cap-up 0.00 MW
147 South Lignite plant inv-cost 600000.00 €/MW
148 South Lignite plant fix-cost 0.00 €/MW/a
149 South Lignite plant var-cost 0.60 €/MWh
150 South Lignite plant wacc 0.07 None
151 South Lignite plant depreciation 40.00 a
152 South Solar plant inst-cap 0.00 MW
153 South Solar plant cap-lo 0.00 MW
154 South Solar plant cap-up 600000.00 MW
155 South Solar plant inv-cost 600000.00 €/MW
156 South Solar plant fix-cost 0.00 €/MW/a
157 South Solar plant var-cost 0.00 €/MWh
158 South Solar plant wacc 0.07 None
159 South Solar plant depreciation 25.00 a
160 South Wind plant inst-cap 0.00 MW
161 South Wind plant cap-lo 0.00 MW
162 South Wind plant cap-up 200000.00 MW
163 South Wind plant inv-cost 1500000.00 €/MW
164 South Wind plant fix-cost 0.00 €/MW/a
165 South Wind plant var-cost 0.00 €/MWh
166 South Wind plant wacc 0.07 None
167 South Wind plant depreciation 25.00 a

168 rows × 5 columns

Denormalizing

In [5]:
# denormalizing the norm with pivot_table, transforms the table back into the original state (apart from column order)
denorm = norm.pivot_table(values='value', index=['Site', 'Process'], columns='variable').reset_index()

# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1) 

# show denormalized df
denorm

# note: the only difference between denorm and df is that the column names are in alphabetic order
Out[5]:
Site Process cap-lo cap-up depreciation fix-cost inst-cap inv-cost var-cost wacc
0 Mid Biomass plant 0.0 5000.0 25.0 0.0 0.0 875000.0 1.4 0.07
1 Mid Coal plant 0.0 0.0 40.0 0.0 0.0 600000.0 0.6 0.07
2 Mid Gas plant 0.0 80000.0 30.0 0.0 0.0 450000.0 1.6 0.07
3 Mid Hydro plant 0.0 1400.0 50.0 0.0 0.0 1600000.0 0.0 0.07
4 Mid Lignite plant 0.0 60000.0 40.0 0.0 0.0 600000.0 0.6 0.07
5 Mid Solar plant 0.0 160000.0 25.0 0.0 0.0 600000.0 0.0 0.07
6 Mid Wind plant 0.0 13000.0 25.0 0.0 0.0 1500000.0 0.0 0.07
7 North Biomass plant 0.0 6000.0 25.0 0.0 0.0 875000.0 1.4 0.07
8 North Coal plant 0.0 100000.0 40.0 0.0 0.0 600000.0 0.6 0.07
9 North Gas plant 0.0 100000.0 30.0 0.0 0.0 450000.0 1.6 0.07
10 North Hydro plant 0.0 20000.0 50.0 0.0 0.0 1600000.0 0.0 0.07
11 North Lignite plant 0.0 0.0 40.0 0.0 0.0 600000.0 0.6 0.07
12 North Solar plant 0.0 3000.0 25.0 0.0 0.0 600000.0 0.0 0.07
13 North Wind plant 0.0 60000.0 25.0 0.0 0.0 1500000.0 0.0 0.07
14 South Biomass plant 0.0 0.0 25.0 0.0 0.0 875000.0 1.4 0.07
15 South Coal plant 0.0 100000.0 40.0 0.0 0.0 600000.0 0.6 0.07
16 South Gas plant 0.0 100000.0 30.0 0.0 0.0 450000.0 1.6 0.07
17 South Hydro plant 0.0 0.0 50.0 0.0 0.0 1600000.0 0.0 0.07
18 South Lignite plant 0.0 0.0 40.0 0.0 0.0 600000.0 0.6 0.07
19 South Solar plant 0.0 600000.0 25.0 0.0 0.0 600000.0 0.0 0.07
20 South Wind plant 0.0 200000.0 25.0 0.0 0.0 1500000.0 0.0 0.07
In [6]:
# In case of columns containing NaN values, pivot_table will omit these columns in the denormalized table.
# Hence, in this case another option is to use unstack():

# denormalizing the norm with unstack
denorm = norm.set_index(['Site', 'Process', 'variable'])['value'].unstack().reset_index()

# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1) 

# show denormalized df
denorm
Out[6]:
Site Process cap-lo cap-up depreciation fix-cost inst-cap inv-cost var-cost wacc
0 Mid Biomass plant 0.0 5000.0 25.0 0.0 0.0 875000.0 1.4 0.07
1 Mid Coal plant 0.0 0.0 40.0 0.0 0.0 600000.0 0.6 0.07
2 Mid Gas plant 0.0 80000.0 30.0 0.0 0.0 450000.0 1.6 0.07
3 Mid Hydro plant 0.0 1400.0 50.0 0.0 0.0 1600000.0 0.0 0.07
4 Mid Lignite plant 0.0 60000.0 40.0 0.0 0.0 600000.0 0.6 0.07
5 Mid Solar plant 0.0 160000.0 25.0 0.0 0.0 600000.0 0.0 0.07
6 Mid Wind plant 0.0 13000.0 25.0 0.0 0.0 1500000.0 0.0 0.07
7 North Biomass plant 0.0 6000.0 25.0 0.0 0.0 875000.0 1.4 0.07
8 North Coal plant 0.0 100000.0 40.0 0.0 0.0 600000.0 0.6 0.07
9 North Gas plant 0.0 100000.0 30.0 0.0 0.0 450000.0 1.6 0.07
10 North Hydro plant 0.0 20000.0 50.0 0.0 0.0 1600000.0 0.0 0.07
11 North Lignite plant 0.0 0.0 40.0 0.0 0.0 600000.0 0.6 0.07
12 North Solar plant 0.0 3000.0 25.0 0.0 0.0 600000.0 0.0 0.07
13 North Wind plant 0.0 60000.0 25.0 0.0 0.0 1500000.0 0.0 0.07
14 South Biomass plant 0.0 0.0 25.0 0.0 0.0 875000.0 1.4 0.07
15 South Coal plant 0.0 100000.0 40.0 0.0 0.0 600000.0 0.6 0.07
16 South Gas plant 0.0 100000.0 30.0 0.0 0.0 450000.0 1.6 0.07
17 South Hydro plant 0.0 0.0 50.0 0.0 0.0 1600000.0 0.0 0.07
18 South Lignite plant 0.0 0.0 40.0 0.0 0.0 600000.0 0.6 0.07
19 South Solar plant 0.0 600000.0 25.0 0.0 0.0 600000.0 0.0 0.07
20 South Wind plant 0.0 200000.0 25.0 0.0 0.0 1500000.0 0.0 0.07

Please check your final table, depending on your data it might not always work as expected.

In [ ]:
 

If you find bugs or if you have ideas to improve the Open Energy Platform, you are welcome to add your comments to the existing issues on GitHub.
You can also fork the project and get involved.

Please note that the platform is still under construction and therefore the design of this page is still highly volatile!