Network Management Operations I. D. Martinez-Casanueva
Internet-Draft L. Cabanillas
Intended status: Informational Telefonica
Expires: 30 August 2025 P. Martinez-Julia
NICT
26 February 2025

Knowledge Graph Construction from Network Data Sources
draft-marcas-nmop-kg-construct-00

Abstract

This document discusses the mechanisms that support the management
and creation of knowledge graphs from data sources specific to the
network management domain. The document provides background on core
aspects such as ontology development, identifies methodologies and
standards, and shares guidelines for integrating network data
sources.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at
https://idomingu.github.io/knowledge-graph-yang/draft-marcas-
knowledge-graph-yang.html. Status information for this document may
be found at https://datatracker.ietf.org/doc/draft-marcas-nmop-kg-
construct/.

Discussion of this document takes place on the Network Management
Operations Working Group mailing list (mailto:nmop@ietf.org), which
is archived at https://mailarchive.ietf.org/arch/browse/nmop/.
Subscribe at https://www.ietf.org/mailman/listinfo/nmop/.

Source for this draft and an issue tracker can be found at
https://github.com/idomingu/knowledge-graph-yang.

Status of This Memo

This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 1]

Internet-Draft kg-construct February 2025

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

This Internet-Draft will expire on 30 August 2025.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.

Table of Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Ontology Development . . . . . . . . . . . . . . . . . . . . 4
3.1. Standard Development Methodologies . . . . . . . . . . . 5
3.2. Automatic Knowledge Extraction from YANG Models . . . . . 5
4. Knowledge Graph Construction Pipeline . . . . . . . . . . . . 6
4.1. Knowledge Objects . . . . . . . . . . . . . . . . . . . . 6
4.2. Pipeline Steps . . . . . . . . . . . . . . . . . . . . . 6
4.2.1. Ingestion . . . . . . . . . . . . . . . . . . . . . . 7
4.2.2. Mapping . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.3. Integration . . . . . . . . . . . . . . . . . . . . . 8
5. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
8. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1. Normative References . . . . . . . . . . . . . . . . . . 11
9.2. Informative References . . . . . . . . . . . . . . . . . 11
Appendix A. NETCONF Data Sources . . . . . . . . . . . . . . . . 14
A.1. Prototype Architecture . . . . . . . . . . . . . . . . . 14
A.2. Target Ontology . . . . . . . . . . . . . . . . . . . . . 15
A.3. KGC Pipeline . . . . . . . . . . . . . . . . . . . . . . 15
A.3.1. Raw data . . . . . . . . . . . . . . . . . . . . . . 16

Martinez-Casanueva, et alExpires 30 August 2025 [Page 2]

Internet-Draft kg-construct February 2025

A.3.2. Mappings . . . . . . . . . . . . . . . . . . . . . . 16
A.3.3. RDF data . . . . . . . . . . . . . . . . . . . . . . 18
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19

1. Introduction

Knowledge graph introduces a new paradigm in data management that
facilitates the integration of heterogenous data silos thanks to a
semantic layer. In the case of network management, knowledge graphs
provide a data integration solution that can cope with the diverse
network data sources and telemetry mechanisms
[I-D.mackey-nmop-kg-for-netops].

The construction of knowledge graphs is a challenging activity that
requires the combination of skills in semantic modelling and data
engineering. Semantic data models are represented by ontologies and
other forms of structured knowledge, which must be kept in sync with
the data pipelines that integrate the different data silos into the
knowlede graph. The data integration process is based on the
ingestion of raw data from their data sources, the mapping of the raw
data to the respective ontologies, and the transformation of the data
into a graph structure semantically-annotated.

In this sense, Knowledge Graph Construction (KGC) underpinned by two
pillars: i) ontology development; and ii) knowledge graph
construction pipeline. These pillars are described in detail in the
following sections.

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.

2.1. Terminology

This document defines the following terms:

Data integration: Process of combining data from diverse sources into
a unified view.

Data mapping: Technique that defines how data from one data model
corresponds to another data model.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 3]

Internet-Draft kg-construct February 2025

Data materialization: Technique that collects data from remote data
source and persists a copy the data in a target data storage. This
process can also be seen as Extract-Transform-Load (ETL).

Data virtualization: Technique wherein an intermediate component
(i.e., data virtualization layer) exposes data available in a remote
data sources without creating an copy of the data. The data
virtualization layer keeps pointers to the original location of data,
so when a data consumer asks for these data, the virtualization layer
collects the data from the source and directly serves the data to the
consumer.

Ontology: Formal, shared representation of knowledge in a domain.

2.2. Acronyms

CQ: Competency Question

ETL: Extract-Transform-Load

KG: Knowledge Graph

KGC: Knowledge Graph Construction

LOT: Linked Open Terms

OWL: Web Ontology Language

RDF: Resource Description Framework

RDFS: RDF Schema

RML: RDF Mapping Language

SAREF: Smart Applications REFerence

SHACL: Shapes Constraint Language

W3C: World Wide Web Consortium

3. Ontology Development

Ontologies provide the formal representation of the conceptual models
that capture the semantics of data, and building on this, the
integration of data in the knowledge graph. Ontologies can be
developed following different techniques, ranging from manual to
fully automated, depending on the characteristics of the data to be
integrated in the knowledge graph (e.g., format or schema).

Martinez-Casanueva, et alExpires 30 August 2025 [Page 4]

Internet-Draft kg-construct February 2025

3.1. Standard Development Methodologies

Developing an ontology is a challenging task that requires skills in
knowledge management and semantic modelling. To ease this process, a
good practice is to follow mature, proven methodologies that provide
thorough guidelines and recommend tools that can help in the
development of an ontology. An example of these methodologies is
Linked Open Terms (LOT) [Poveda-Villalon2022].

LOT is an ontology development methodology that adopts best practices
from agile software development. The methodology has been widely
used in European projects as well as in the creation of the ETSI
SAREF ontology and its extensions. Precisely, with SAREF Ontology
ETSI tackled a similar problem in the scope of IoT, where there is a
heterogeneous variety of standard data models and protocols. The
methodology iterates over a workflow of the following four
activities:

1. ontology requirements specification

2. ontology implementation

3. ontology publication, and

4. ontology maintenance.

The workflow starts with the specification of requirements that the
ontology must fulfill. To that aim, the methodology requires
collecting knowledge from domain experts, but also by analyzing the
data sources (e.g., network devices) and schemas for the data (e.g.,
YANG data models) to be ingested and integrated in the knowledge
graph. LOT recommends several approaches such as competency
questions (CQs), natural language statements, or tabular information
inspired by METHONTOLOGY.

3.2. Automatic Knowledge Extraction from YANG Models

The extraction of knowledge from YANG models could be automated, for
example, by analyzing YANG identities to generate controlled
vocabularies and taxonomies.

[RFC7950] defines a YANG identity as "globally unique, abstract, and
untyped identity", therefore, a relation between a YANG identity and
a concept is straightforward. Additionally, YANG identities can
inherit from other YANG identities via the "base" statement. These
ideas align with the notion of a taxonomy, where concepts are
hierarchically linked with other concepts.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 5]

Internet-Draft kg-construct February 2025

To support the creation of knowledge structures like taxonomies or
thesauri, the W3C standardized the Simple Knowledge Organization
System (SKOS). In such ontology, a concept scheme comprises a set of
concepts that can be linked with other concepts via hierarchical and
associative relations. Typically, a YANG model containing YANG
identities can be represented as an instance of the
"skos:ConceptScheme" class. Next, all YANG identities included in a
YANG model can be represented as "skos:Concept instances" that are
contained in the concept scheme. Lastly, those YANG identities that
include the "base" statement, the respective SKOS concept will
include a relation "skos:broader" whose range is the SKOS concept
representing the parent YANG identity.

4. Knowledge Graph Construction Pipeline

4.1. Knowledge Objects

The intrinsic nature of knowledge graphs is to connect as much
knowledge as possible within certain scope---time and/or space.
However, not all processes and operations require whole knowledge
graphs. For instance, the communication of a piece of telemetry
data, organized according to NTF [RFC9232], can be repreented as a
subset of the knowledge graph of all measurements.

A knowledge object, as defined in [EERVC], consists in a knowledge
graph subset of an arbitrary size---from single atoms to tens or
hundreds of triples---that is decorated with metadata to facilitate
its contextualization.

Knowledge objects are particularly well suited to enable entities
that work with knowledge graphs to communicate to each other
knowledge pieces, obtained from their knowledge graphs or newly
created from other sources, such as monitoring. It has been
demonstrated in [SECDEP].

4.2. Pipeline Steps

The construction of a knowledge graph is supported by a data pipeline
that follows the archetypical Extract-Transform-Load (ETL), wherein
the raw data is collected from the source(s), transformed, and
finally, stored for consumption. The knowledge graph creation
pipeline can thus be split into multiple steps as depicted in
Figure 1.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 6]

Internet-Draft kg-construct February 2025

Figure 1: High-level architecture of a Knowledge Graph
Construction Pipeline

These steps are the following: ingestion, mapping, and integration.

4.2.1. Ingestion

Represents the first step in the creation of the knowledge graph.
This step is realized by means of collectors that ingest raw data
from the selected data source. These collectors implement data
access protocols which are specific to the technology and type of the
data source. For instance, when it comes to network management
protocols based on YANG, these protocols can be NETCONF [RFC6241],
RESTCONF [RFC8040] and gNMI [GNMI].

Two main types of data sources are identified based on the techniques
used to ingest the data, namely, batch and streaming. In the case of
batch data sources data are pulled (once or periodically) from the
data source.

Regarding streaming data sources, the collector subscribes to a YANG
server to receive notifications of YANG data periodically or upon
changes in the data source (e.g., a network device whose interface
goes down). These subscriptions can be realized, either based on
configuration or dynamically, using mechanisms like YANG Push
[RFC8641]. But additionally, another common scenario is the use of
message broker systems like Apache Kafka for decoupling the ingestion
of streams of YANG data
[I-D.netana-nmop-yang-message-broker-integration]. Hence, knowledge
graph collectors could also support the ingestion of YANG data from
these kinds of message brokers.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 7]

Internet-Draft kg-construct February 2025

4.2.2. Mapping

This second step consists at receiving the raw data data from the
Ingestion step. Here, the raw data is mapped to the concepts
captured in one or more ontologies. By applying these mapping rules,
the raw data is semantically annotated and transformed into RDF data.
Depending on the nature of the raw data, different techniques can
applied.

In the case of (semi-)structured data such as tabular data (e.g.,
CSV, relational databases) or hierarchical data (e.g., JSON, XML)
these mappings can be defined by using declarative languages like RDF
Mapping Language (RML) [Iglesias-Molina2023].

RML is a declarative language that is currently being standardized
within the W3C Knowledge Graph Construction Community group [W3C-KGC]
that allows for defining mappings rules for raw data encoded in semi-
structured formats like XML or JSON. The benefits of using a
declarative language like RML are twofold: i) the engine that
implements the RML rules is generic, thus the mappings rules are
decoupled from the code; ii) the explicit representation of mapping
and transformation rules as part of the knowledge graph provides data
lineage insights that can greatly improve data quality and the
troubleshooting of data pipelines. RML is making progress towards
becoming a standard, but support of additional YANG encoding formats
like CBOR [RFC8949] or Protobuf remains a challenge. The knowledge
payload carried by CBOR and/or Protobuf is organized as knowledge
objects transmitted by the mapping entities and received by the
materialization entities. The use of knowledge objects allows them
to easily "cut" knowledge graphs into smaller pieces, transmit them,
and "paste" and/or "glue" the pieces onto the destination knowledge
graph. Consistency is retained by making the same ontologies be used
with the particular knowledge objects.

4.2.3. Integration

This is the final step of the knowledge graph creation. This step
receives as an input the knowledge object that contains RDF data
generated in the Mapping step, which has easily manageable semantic
triples---or quadruples---, as well as metadata to contextualize them
and facilicate the incorporation of the knwoledge to the local
knowledge graph storage element. At this point, the RDF data can be
sent to an RDF triple store like Apache Jena Fuseki [Fuseki] for
consumption via SPARQL. But alternatively, this step may transform
the RDF data into an LPG structure and store the resulting data in a
graph database like Neoj4 [Neo4j]. Similarly, the RDF data could
also be transformed into the ETSI NGSI-LD standard [ETSI-GS-CIM-009]
and stored in an NGSI-LD Context Broker.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 8]

Internet-Draft kg-construct February 2025

5. Challenges

Ontology development: Time-consuming task that requires skills in
knowledge management and conceptual modeling. Additionally,
ontology developers should maintain a tight coordination with
domain owners and ontology users. Following a standard
methodology like LOT provides guidance in the process but still,
the development of the ontology requires manual work. Tools that
can produce or bootstrap ontologies from existing data models in a
semi-automatic, or even automatic, are desirable. In this sense,
data models could include explicit semantics in the data models,
in the same way that JSON-LD [JSON-LD] or CSVW [CSVW] include
metadata indicating which concepts from concepts are referenced by
the data.

Pipeline performance: To integrate the raw data from the original
data source into the knowledge graph entails several steps as
described before. This steps add an extra latency before having
the data stored in the knowledge graph for consumption. This
latency can be an important limitation for real-time analytics use
cases.

Scalability: The knowledge graph must be able to integrate massive
amounts of data collected from the network. Distributed and
federated architectures can improve the scalability of a global,
composable knowledge graph. However, these architectures add
complexity to the management of knowledge graph as well as extra
latency when federating requests.

Virtualization: The common approach for data integration is by
materializing the data in the knowledge graph, which entails
duplicating the data. However, this approach presents multiple
limitations in terms of data governance and data cadence.
Regarding data governance, having copies of the original data
hampers keeping track of all the available data. With respect to
data cadence, in particular for batch data sources, data are
periodically pulled from the source at particular frequency, which
might not be optimal depending on the use case. In this sense,
data virtualization introduces a new data access technique that
can overcome these limitations. With this technique, the
knowledge graph defines pointers to the data at the original
source, and the KGC pipeline performs the ingestion and mapping of
the data, and eventually the delivery of data to the consumer,
only when requested on demand.

6. Security Considerations

Access control to data: The knowledge graph becomes an integrator of

Martinez-Casanueva, et alExpires 30 August 2025 [Page 9]

Internet-Draft kg-construct February 2025

data, and, in many cases, sensible. Therefore, data access
control mechanisms must be present to ensure that only authorized
consumers can discover and access data from the knowledge graph.
Access control policies based on roles or attributes are common
approaches, but additional aspects like sensitivity of data could
be included in the policy.

Integrity and authenticity of mappings: The declaration of mappings
of raw data to concepts in ontologies is a critical step in the
knowledge graph construction. Unauthorized mappings, or even
tampered mappings, can lead to security breaches and anomalies
producing a great impact on analytics and machine learning
applications that consume data from the knowledge graph. To
protect consumers from these scenarios, the knowledge graph must
include mechanisms that verify the correctness, authenticity, and
integrity of the mappings used in the construction of the graph.
Only data owners, as accountable of their data, should be
authorized to define and deploy mappings for the knowledge graph
construction.

Data provenance: Keeping track of the history of data as they go
through the knowledge graph construction pipeline can improve the
quality of the data of the knowledge graph. As part of the
knowledge graph construction, signatures can be appended to the
data [I-D.lopez-opsawg-yang-provenance], can help in verifying
that such data come from the golden data source, and therefore,
that the data can be trusted.

7. IANA Considerations

This document has no IANA actions.

8. Open Issues

* Should this document provide guidelines for generating URIs of
nodes/subjects in the knowledge graph? Take into account there
are several levels of abstraction device vs network/service level.
For example, the URI that identifies a network interface cannot be
generated only from the name of the interface as there could
conflicts with other interfaces of other network devices having
the same name.

* Implementations? References to examples based on open-source
implementations. Integration with YANG-Push-Kafka architecture.
Target future hackathons.

9. References

Martinez-Casanueva, et alExpires 30 August 2025 [Page 10]

Internet-Draft kg-construct February 2025

9.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.

[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

9.2. Informative References

[ANSA] Pedro Martinez-Julia, Ved P. Kafle, Hitoshi Asaeda.,
"Application of Category Theory to Network Service Fault
Detection. IEEE Open Journal of the Communications Society
5 (2024): 4417-4443.", n.d..

[CSVW] "CSVW - CSV on the Web", n.d., <https://csvw.org>.

[EERVC] Pedro Martinez-Julia, Ved P. Kafle, Hiroaki Harai.,
"Exploiting External Events for Resource Adaptation in
Virtual Computer and Network Systems, IEEE Transactions on
Network and Service Management 15 (2018): 555-566.", n.d..

[ETSI-GS-CIM-009]
"Context Information Management (CIM); NGSI-LD API", March
2024, <https://www.etsi.org/deliver/etsi_gs/
CIM/001_099/009/01.08.01_60/gs_CIM009v010801p.pdf>.

[Fuseki] Apache, "Apache Jena Fuseki", n.d.,
<https://jena.apache.org/documentation/fuseki2/>.

[GNMI] OpenConfig, "gRPC Network Management Interface (gNMI)",
n.d.,
<https://github.com/openconfig/reference/blob/master/rpc/
gnmi/gnmi-specification.md>.

[I-D.ietf-ivy-network-inventory-yang]
Yu, C., Belotti, S., Bouquier, J., Peruzzini, F., and P.
Bedard, "A Base YANG Data Model for Network Inventory",
Work in Progress, Internet-Draft, draft-ietf-ivy-network-
inventory-yang-04, 5 November 2024,
<https://datatracker.ietf.org/doc/html/draft-ietf-ivy-
network-inventory-yang-04>.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 11]

Internet-Draft kg-construct February 2025

[I-D.ietf-netconf-yang-library-augmentedby]
Lin, Z., Claise, B., and I. D. Martinez-Casanueva,
"Augmented-by Addition into the IETF-YANG-Library", Work
in Progress, Internet-Draft, draft-ietf-netconf-yang-
library-augmentedby-01, 21 October 2024,
<https://datatracker.ietf.org/doc/html/draft-ietf-netconf-
yang-library-augmentedby-01>.

[I-D.lopez-opsawg-yang-provenance]
Lopez, D., Pastor, A., Feng, A. H., Birkholz, H., and S.
Garcia, "Applying COSE Signatures for YANG Data
Provenance", Work in Progress, Internet-Draft, draft-
lopez-opsawg-yang-provenance-04, 5 January 2025,
<https://datatracker.ietf.org/doc/html/draft-lopez-opsawg-
yang-provenance-04>.

[I-D.mackey-nmop-kg-for-netops]
Mackey, M., Claise, B., Graf, T., Keller, H., Voyer, D.,
and P. Lucente, "Knowledge Graph Framework for Network
Operations", Work in Progress, Internet-Draft, draft-
mackey-nmop-kg-for-netops-01, 21 October 2024,
<https://datatracker.ietf.org/doc/html/draft-mackey-nmop-
kg-for-netops-01>.

[I-D.netana-nmop-yang-message-broker-integration]
Graf, T. and A. Elhassany, "An Architecture for YANG-Push
to Message Broker Integration", Work in Progress,
Internet-Draft, draft-netana-nmop-yang-message-broker-
integration-00, 22 April 2024,
<https://datatracker.ietf.org/doc/html/draft-netana-nmop-
yang-message-broker-integration-00>.

[Iglesias-Molina2023]
Iglesias-Molina, A., "The RML Ontology: A Community-Driven
Modular Redesign After a Decade of Experience in Mapping
Heterogeneous Data to RDF", The Semantic Web â€“ ISWC 2023 ,
October 2023,
<https://doi.org/10.1007/978-3-031-47243-5_9>.

[JSON-LD] W3C, "JSON-LD 1.1: A JSON-based Serialization for Linked
Data", July 2020, <https://www.w3.org/TR/json-ld11/>.

[Neo4j] "rdflib-neo4j - RDFLib Store backed by neo4j", n.d.,
<https://github.com/neo4j-labs/rdflib-neo4j>.

Martinez-Casanueva, et alExpires 30 August 2025 [Page 12]

Internet-Draft kg-construct February 2025

[Poveda-Villalon2022]
Engineering Applications of Artificial Intelligence, "LOT:
An industrial oriented ontology engineering framework",
May 2022,
<https://doi.org/10.1016/j.engappai.2022.104755>.

[RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
and A. Bierman, Ed., "Network Configuration Protocol
(NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
<https://www.rfc-editor.org/rfc/rfc6241>.

[RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
RFC 7950, DOI 10.17487/RFC7950, August 2016,
<https://www.rfc-editor.org/rfc/rfc7950>.

[RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF
Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017,
<https://www.rfc-editor.org/rfc/rfc8040>.

[RFC8345] Clemm, A., Medved, J., Varga, R., Bahadur, N.,
Ananthakrishnan, H., and X. Liu, "A YANG Data Model for
Network Topologies", RFC 8345, DOI 10.17487/RFC8345, March
2018, <https://www.rfc-editor.org/rfc/rfc8345>.

[RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications
for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641,
September 2019, <https://www.rfc-editor.org/rfc/rfc8641>.

[RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object
Representation (CBOR)", STD 94, RFC 8949,
DOI 10.17487/RFC8949, December 2020,
<https://www.rfc-editor.org/rfc/rfc8949>.

[RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
A. Wang, "Network Telemetry Framework", RFC 9232,
DOI 10.17487/RFC9232, May 2022,
<https://www.rfc-editor.org/rfc/rfc9232>.

[SECDEP] Ana Hermosilla, Jose Manuel ManjÃ³n-CÃ¡liz, Pedro Martinez-
Julia, Antonio Pastor, Jordi Ortiz, Diego R. Lopez,
Antonio Skarmeta., "Secure deployment of third-party
applications over 5G-NFV ML-empowered infrastructures, the
7th International Conference on Mobile Internet Security
(MobiSec '23), Dec 19-21, 2023, Okinawa, Japan.", n.d..

Martinez-Casanueva, et alExpires 30 August 2025 [Page 13]

Internet-Draft kg-construct February 2025

[TKDP] Pedro Martinez-Julia, Ved P. Kafle, Hitoshi Asaeda.,
"Telemetry Knowledge Distributed Processing for Network
Digital Twins and Network Resilience. NOMS 2023-2023 IEEE/
IFIP Network Operations and Management Symposium (2023):
1-6.", n.d..

[W3C-KGC] W3C, "Knowledge Graph Construction Community Group", n.d.,
<https://www.w3.org/community/kg-construct/>.

Appendix A. NETCONF Data Sources

This appendix presents a scenario that demonstrates the construction
of a knowledge graph based on YANG data collected from a NETCONF
server. In particular, the scenario tackles the creation of a data
catalog based on a knowledge graph that keeps registry of the YANG
data sources and the YANG data models that they implement.

As described in [I-D.ietf-netconf-yang-library-augmentedby], data
catalog implementations backed by knowledge graphs provide powerful
solutions that can easily incorporate additional context to the
catalog. As an evolution of the YANG Catalog service, the resulting
knowledge graph facilitates the navigation across dependencies of
YANG modules, but more importantly, enables the combination of these
data with other data silos such as the network topology [RFC8345] or
network hardware inventory [I-D.ietf-ivy-network-inventory-yang].

To create a knowledge graph that supports the data catalog, the
proposed approach is based on collecting data from the YANG Library
from devices running in the network, in this case, from a NETCONF
server. For this, the RML engine queries the NETCONF server to
retrieve the YANG Library data, and then, applies the RML mappings to
transform the YANG data into RDF according to the target ontology.

This prototype was conducted as part of the paper "Declarative
Construction of Knowledge Graphs from NETCONF Data Sources" sent to
the Semantic Web Journal (currently under review):
https://www.semantic-web-journal.net/content/declarative-
construction-knowledge-graphs-netconf-data-sources-0

A.1. Prototype Architecture

A high-level architecture of the prototype that validates the
implementation is shown below:

Martinez-Casanueva, et alExpires 30 August 2025 [Page 14]

Internet-Draft kg-construct February 2025

Figure 2: Architecture of prototype to construct knowledge graph
from NETCONF data source

BURP was selected as the open-source implementation of an RML engine
that was chosen for this prototype. The NETCONF server is emulated
using the netopeer2.

A.2. Target Ontology

The YANG Library Ontology was developed to represent the
implementation details of YANG module and submodules, along with
their interdependencvies, in the different datastores of YANG server.
The ontology was developed following the LOT methodology and is
publicly available at: https://w3id.org/yang/library

The code of the ontology and all related artifacts are publicly
available on GitHub: https://github.com/candil-data-fabric/yang-
library-ontology

A.3. KGC Pipeline

In addition to the YANG Library Ontology, the YANG Server Ontology
was developed to represent YANG data sources such as NETCONF servers
and operations to retrieve data from them such as queries or
subscriptions. Similarly, this ontology was developed following the
LOT methodology and is publicly available at: https://w3id.org/yang/
server

Martinez-Casanueva, et alExpires 30 August 2025 [Page 15]

Internet-Draft kg-construct February 2025

The code of the ontology and all related artifacts are publicly
available on GitHub: https://github.com/candil-data-fabric/yang-
server-ontology

The YANG Server Ontology is used in combination with the RML
vocabulary to describe the access to YANG servers, from which the
collected data are transformed into RDF. In this sense, BURP was
extendended to support the ingestion of YANG data from NETCONF
servers using NETCONF queries.

The following subsections include excerpts of the raw XML data
(Figure 3), RML mappings (Figure 4), and final RDF data (Figure 3).
The complete examples can be found on: https://github.com/candil-
data-fabric/yang-library-ontology/tree/main/knowledge-graph/xpath

A.3.1. Raw data

<modules-state xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-library"
xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
<module-set-id>1</module-set-id>
<module>
<name>ietf-yang-patch</name>
<revision>2017-02-22</revision>
<schema>file:///etc/sysrepo/yang/ietf-yang-patch@2017-02-22.yang</schema>
<namespace>urn:ietf:params:xml:ns:yang:ietf-yang-patch</namespace>
<conformance-type>import</conformance-type>
</module>
<module>
<name>ietf-ip</name>
<revision>2018-02-22</revision>
<schema>file:///etc/sysrepo/yang/ietf-ip@2018-02-22.yang</schema>
<namespace>urn:ietf:params:xml:ns:yang:ietf-ip</namespace>
<conformance-type>implement</conformance-type>
</module>
</modules-state>

Figure 3: Excerpt of YANG Library data collected from a NETCONF
server

A.3.2. Mappings

@prefix yl: <https://w3id.org/yang/library#> .
@prefix ys: <https://w3id.org/yang/server#> .
@prefix rml: <http://w3id.org/rml/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix core: <https://ontology.unifiedcyberontology.org/uco/core/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix observable: <https://ontology.unifiedcyberontology.org/uco/observable/> .

Martinez-Casanueva, et alExpires 30 August 2025 [Page 16]

Internet-Draft kg-construct February 2025

@base <https://netconf-rml-demo.org/> .

# Connection details to NETCONF server
<netconf-server-1> a ys:NetconfServer ;
ys:socketAddress <netconf-server-1/socket-address> ;
ys:serverAccount <netconf-server-1/account> ;
ys:hostKeyVerification "false" ;
ys:capability ys:XpathCapability ,
ys:YangLibrary1.0 .

<netconf-server-1/datastores/operational> a ys:OperationalDatastore ;
ys:server <netconf-server-1> .

<netconf-server-1/datastores/running> a ys:RunningDatastore ;
ys:server <netconf-server-1> .

<netconf-server-1/socket-address> a observable:SocketAddress ;
observable:addressValue "localhost:830" .

<netconf-server-1/account> a ys:ServerAccount ;
ys:username "netconf" ;
core:hasFacet <netconf-server-1/account/authentication> .

<netconf-server-1/account/authentication> a observable:AccountAuthenticationFacet ;
observable:password "netconf" ;
observable:passwordType "plain-text" .

<filters/xpath/yang-library> a ys:XPathFilter ;
ys:xpathValue "/yanglib:modules-state";
ys:namespace [ a ys:Namespace ;
ys:namespacePrefix "yanglib" ;
ys:namespaceURL "urn:ietf:params:xml:ns:yang:ietf-yang-library" ;
];
.

<#TriplesMap> a rml:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:source [ a ys:Query, rml:Source ;
ys:sourceDatastore <netconf-server-1/datastores/operational> ;
ys:filter <filters/xpath/yang-library>
];
rml:referenceFormulation [ a ys:NetconfQuerySource ;
rml:namespace [ a rml:Namespace ;
rml:namespacePrefix "yanglib" ;
rml:namespaceURL "urn:ietf:params:xml:ns:yang:ietf-yang-library" ;
];
];
rml:iterator "/yanglib:modules-state/yanglib:module";

Martinez-Casanueva, et alExpires 30 August 2025 [Page 17]

Internet-Draft kg-construct February 2025

];
rml:subjectMap [ a rml:SubjectMap;
rml:template "http://example.org/module/{yanglib:name/text()}:{yanglib:revision/text()}";
rml:class yl:Module;
];
rml:predicateObjectMap [ a rml:PredicateObjectMap;
rml:predicateMap [ a rml:PredicateMap;
rml:constant yl:moduleName;
];
rml:objectMap [ a rml:ObjectMap;
rml:reference "yanglib:name/text()";
rml:datatype xsd:string;
];
];
rml:predicateObjectMap [ a rml:PredicateObjectMap;
rml:predicateMap [ a rml:PredicateMap;
rml:constant yl:revisionDate;
];
rml:objectMap [ a rml:ObjectMap;
rml:reference "yanglib:revision/text()";
rml:datatype xsd:date;
];
];
rml:predicateObjectMap [ a rml:PredicateObjectMap;
rml:predicateMap [ a rml:PredicateMap;
rml:constant yl:namespace;
];
rml:objectMap [ a rml:ObjectMap;
rml:reference "yanglib:namespace/text()";
rml:datatype xsd:anyURI;
];
];
.

Figure 4: RML mappings that collect YANG Library from a NETCONF
server and map them to the YANG Library Ontology

A.3.3. RDF data

Martinez-Casanueva, et alExpires 30 August 2025 [Page 18]

Internet-Draft kg-construct February 2025

<http://example.org/module/ietf-ip:2018-02-22>
<https://w3id.org/yang/library#moduleName>
"ietf-ip"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://example.org/module/ietf-ip:2018-02-22>
<https://w3id.org/yang/library#revisionDate>
"2018-02-22"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://example.org/module/ietf-ip:2018-02-22>
<https://w3id.org/yang/library#namespace>
"urn:ietf:params:xml:ns:yang:ietf-ip"^^<http://www.w3.org/2001/XMLSchema#anyURI> .
<http://example.org/module/ietf-ip:2018-02-22>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<https://w3id.org/yang/library#Module> .
<http://example.org/module/ietf-yang-patch:2017-02-22>
<https://w3id.org/yang/library#moduleName>
"ietf-yang-patch"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://example.org/module/ietf-yang-patch:2017-02-22>
<https://w3id.org/yang/library#revisionDate>
"2017-02-22"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://example.org/module/ietf-yang-patch:2017-02-22>
<https://w3id.org/yang/library#namespace>
"urn:ietf:params:xml:ns:yang:ietf-yang-patch"^^<http://www.w3.org/2001/XMLSchema#anyURI> .
<http://example.org/module/ietf-yang-patch:2017-02-22>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<https://w3id.org/yang/library#Module> .

Figure 5: Excerpt of RDF triples generated using the RML mappings
and the YANG Library data

Acknowledgments

This document is based on work partially funded by the EU Horizon
Europe projects aerOS (grant agreement no. 101069732) and ROBUST-6G
(grant agreement no. 101139068).

The authors would like to thank Med, Benoit, Lionel, and Thomas for
their review and valuable comments.

Authors' Addresses

Ignacio Dominguez Martinez-Casanueva
Telefonica
Email: ignacio.dominguezmartinez@telefonica.com

Lucia Cabanillas
Telefonica
Email: lucia.cabanillasrodriguez@telefonica.com

Martinez-Casanueva, et alExpires 30 August 2025 [Page 19]

Internet-Draft kg-construct February 2025

Pedro Martinez-Julia
NICT
Email: pedro@nict.go.jp

Martinez-Casanueva, et alExpires 30 August 2025 [Page 20]