Data models are one of the most overlooked software design aspects in Power Systems software,
and in a field that relies on computer simulation, bad designs lead to unnecessary complexity and
bad performance. The need for some sort of canonical data model to share data dates back to the 1950’s with
the first applications of a digital computer to solve the “load flow” problem and appeared in the
literature in a discussion on how to implement a digitial load flow calculator 1.
With the widespread adoption of computer based load flow calculations, myriad solution methods and models
led to a problem of data and model exchange, as illustrated by the following quote:
With the growth in complexity of the interconnected power systems in the 1960’s came a
corresponding growth in the number of load flow programs being used and in the number of
study groups using those programs. This growth resulted in a need to exchange data at an
increasing rate.
Working Group on Common Format For Exchange of Solved Load Flow
Data 1973
Historically, electric power systems modeling has been the source complex data of requirements.
Most importantly, there has been an explicit division between power systems models based on
their scope. Different models require different simplifications to obtain system insights
that engineers require. These engineering simplifications and assumptions
have also carried over to other fields like energy policy and economics. In this regard, the
most significant model that has informed data processing and sharing is the “load flow” problem
with extension to the singlpe period “economic load flow” problem.
Leon Kirchmayer, in his seminal work about power system economic operation 2, provides
a detailed account about the early use of computers for the economic optimization of power
systems. These rudimentary computational systems were limited to punch cards as the medium
to load data. As a result, the first data models were merely column indexes to physical quantities.
Punch cards evolved to become fixed position and fixed order file data models. The first
generally agreed data model for power systems computational analysis: the IEEE Common Format
published in 1973 3. The common format data file had lines of up to 128 characters, the
lines are grouped into sections with section headers and data items are entered in specific
columns. It provided a standard format to store and exchange data based on the original
punch card specification, emulating the physical storage medium that preceded.
Although since 1973 there has been a significant increase in computational power, algorithm
development, and novel applications of computers to the analysis of electrical power systems,
tabular data models still dominate the field. All major data formats and models for commercial and
academic power systems software have employed tables with custom specifications to store
and exchange system data. In the context of open-source modeling, the data format used in
Matpower is standard for encoding system data sets due to the
popularity of MATLAB in power system researchers’ circles.
The need to share information evolved in the early 1990s with the advent of automation,
and spurred by increasingly complex data needs for power systems operations. The industry
required standardized models to exchange more extensive information, resorting to an
object-oriented data model. The CIM was developed and later made a standard maintained by
the IEC Technical Committee 57 Working Group 13. The aim was to provide a standard definition
for power system components geared towards automated EMS, SCADA systems, and asset-management
databases. Automation-oriented modeling makes CIM challenging to implement for modeling
purposes and is not widely used in any modeling software available today. It is available
in only a few commercial power system software and the only open-source parsing implementation
is the iTesla library.
One of the key qualities of electric power systems modeling is the rigid separation between
steady-state and dynamic modeling practices. Simulation tools have kept separate data models
between the two classes of models, and a few commercial providers dominate the market for
dynamic modeling. As a result, the dynamic data model is dependent on the software available
for the researcher. Such artificial separation hinders cross-domain research and further
limits the development of newer models. Some efforts to develop open data models geared
towards dynamic modeling such as PSAT have been limited
to teaching and are no longer maintained. The data model implemented in python is described
partially in 4 but has had little uptake.
With the advent of new algorithms, models, and programming languages, as well as broad
access to computers, new software tools and data formats proliferated. Milano provides a
detailed taxonomy of available commercial and open-source data sources up to 20104.
The review includes 17 data models categorized by the supported mathematical models and
file format restrictions.
Recently, new static modeling tools such as Pandapower, PyPSA,
and PowerModels.jl have used
data models largely based on MATPOWER’s original schema. In the dynamic modeling domain,
the tool ANDES implemented a data model using symbolic
libraries. Also, the OpenModelica library with the capabilities to parse PSS/e and CIM are
available. However, developing extensions require some source code modification, and cannot
be integrated with steady state models.
The review so far highlights the progress coming from the power systems community, given a
more widespread adoption of certain “standard” practices. Several commercial software
applications’ dominate in other modeling communities, and each relies on its proprietary
data format. Such is the case of production cost modeling, which requires a richer data model
to handle large amounts of time series data. Significant efforts has been put towards develop
to process XML proprietary data formats into open data sets, but have not resulted in a more
systematic approach.
When augmentations are required, MATPOWER provides certain flexibility to augment
the data though its “extensions”, this is the most commonly used approach. Extending MATPOWER’s
data requires creating makeshift relationships between the user-added arrays and the arrays
already in the model. Fixed location and length representations are not inherently designed
to store data with mixed data representations and hierarchical structures. Tables are difficult
to extend beyond their original design. For instance, adding a new feature implies adding a
new column for the totality of the category. To the authors’ knowledge, the production cost
modeling community does not have a similar effort as the power systems community, and in most
cases, data models used in cost production modeling are extensions of power systems data models.
Moreover, the growing importance of data provenance and reproducibility demands solutions
that reduce to a minimum the need the develop ad-hoc data models.
In recent years, there has been increasing multi-sector modeling of energy systems. Initiatives
like OpenGenome powergenome, Spine spine-toolbox, and the Open Energy Platform have focused
on integrating power systems data into broad energy infrastructure models. These initiatives
exploit modern computing concepts and architectures like REST API, portable databases, and
version control to provide users with a more straightforward pathway to integrate decision
models with data. Importantly, these initiatives seek to contribute curated datasets as
part of their repositories. Commonly multi-sector projects focus on long-term planning and
strategic decision-making, which require economic data on top of the technical device-level
data. These techno-economic modeling communities make outstanding contributions by exploiting
modern concepts in data management for large systems. For instance, the Open Energy Platform
implements advanced table format data sets to facilitate the inspection of datasets.
As a consequence of these data representations’ explosions, model developers devote significant
resources to parsing and data model conversion. In most cases, these efforts are developed
to serve within the analytical model’s scope. Creating a standard data model and dedicated
tools for data management across domains is critical to improving electric energy systems’
modeling practices.
@article{lara2021powersystems,title={Powersystems. jl—a power system data management package for large scale modeling},author={Lara, Jos{\'e} Daniel and Barrows, Clayton and Thom, Daniel and Krishnamurthy, Dheepak and Callaway, Duncan},journal={SoftwareX},volume={15},pages={100747},year={2021},publisher={Elsevier}}
J. M. Henderson, “Automatic Digital Computer Solution of Load Flow Studies [includes discussion],” in Transactions of the American Institute of Electrical Engineers. Part III: Power Apparatus and Systems, vol. 73, no. 2, pp. 1696-1702, Jan. 1954. ↩
Kirchmayer, Leon K. Economic operation of power systems. Vol. 707. New York: Wiley, 1958. ↩
W. Group, “Common Format For Exchange of Solved Load Flow Data,” in IEEE Transactions on Power Apparatus and Systems, vol. PAS-92, no. 6, pp. 1916-1925, Nov. 1973, doi: 10.1109/TPAS.1973.293571. ↩
Milano, Federico. Power system modelling and scripting. Springer Science & Business Media, 2010. ↩↩2
How Does Philosophy Influence Power Systems Modeling
Wittgenstein's philosophy of language illuminates why semantic clarity in power systems modeling matters — from Sienna's type system to the failures of CIM and the preconditions for trustworthy AI.