eds4jinja2.adapters package

Submodules

eds4jinja2.adapters.base_data_source module

An abstract data source that is implemented by the specific ones.

exception UnsupportedRepresentation[source]

Bases: Exception

Unsupported representation exception

class DataSource[source]

Bases: abc.ABC

A generic data source that fetches data either in tabular or tree representation.

The fail safe run is foreseen by default, to provide entire context back into the Template.

>>> content, error = data_source.fetch_tabular()

Exception prone running shall be performed with the underscored functions

>>> content, error = data_source._fetch_tabular()

To fetch a tree

>>> content, error = data_source.fetch_tree()
fetch_tabular() Tuple[Optional[object], Optional[str]][source]

Read the content from the data source in tabular structure.

Returns

a tuple where the first element is a result of a successful data reading and the second is the error message in case of failure

fetch_tree() Tuple[Optional[object], Optional[str]][source]

Read the content from the data source and return a tree structure.

Returns

a tuple where the first element is a result of a successful data reading and the second is the error message in case of failure

abstract _can_be_tabular() bool[source]
abstract _can_be_tree() bool[source]
abstract _fetch_tabular()[source]

fetch data and return as tabular representation

Returns

abstract _fetch_tree()[source]

fetch data and return as tree representation

Returns

_abc_impl = <_abc_data object>

eds4jinja2.adapters.file_ds module

class FileDataSource(file_path)[source]

Bases: eds4jinja2.adapters.base_data_source.DataSource

Fetches data from a file. Automatically determines the file type.

  • Supported tabular file types: “.csv”, “.tsv”, “.xlsx”, “.xls”

  • Supported tree file types: “.json”, “.yaml”, “.yml”, “.toml”, “.json-ld”, “.jsonld”

To read a CVS

>>> ds = FileDataSource("path/to/the/file.csv")
>>> pd_data_frame, error = ds.fetch_tabular()

To read a JSON

>>> ds = FileDataSource("path/to/the/file.json")
>>> pd_data_frame, error = ds.fetch_tree()
property file_path: pathlib.Path

The location of the DataSource file

Returns

property _file_extension
_can_be_tree() bool[source]
_can_be_tabular() bool[source]
_fetch_tree()[source]

fetch data and return as tree representation

Returns

_fetch_tabular()[source]

fetch data and return as tabular representation

Returns

_abc_impl = <_abc_data object>

eds4jinja2.adapters.http_ds module

class HTTPDataSource[source]

Bases: eds4jinja2.adapters.base_data_source.DataSource

Fetching data by HTTP calls

_abc_impl = <_abc_data object>

eds4jinja2.adapters.latex_utils module

Latex parsing functionality

This module provides functions to parse latex text

Credits: This module was adapted after the Samuel Roeca’s latex repository (pappasam/latexbuild on github)

escape_latex(value: str)[source]

Escape a latex string

Parameters

value

Returns

eds4jinja2.adapters.local_sparql_ds module

class RDFFileDataSource(filename)[source]

Bases: eds4jinja2.adapters.base_data_source.DataSource

Accesses a local RDF file and provides the possibility to fetch data from it by SPARQL queries.

__reduce_bound_triple_to_string_format(dict_of_bound_variables: dict)
with_query(sparql_query: str, substitution_variables: Optional[dict] = None, prefixes: str = '') eds4jinja2.adapters.local_sparql_ds.RDFFileDataSource[source]

Set the query text and return the reference to self for chaining.

Returns

with_query_from_file(sparql_query_file_path: str, substitution_variables: Optional[dict] = None, prefixes: str = '') eds4jinja2.adapters.local_sparql_ds.RDFFileDataSource[source]

Set the query text and return the reference to self for chaining.

Returns

with_file(file: str) eds4jinja2.adapters.local_sparql_ds.RDFFileDataSource[source]

Set the query text and return the reference to self for chaining.

Returns

_fetch_tabular()[source]

fetch data and return as tabular representation

Returns

_fetch_tree()[source]

fetch data and return as tree representation

Returns

_can_be_tree() bool[source]
_can_be_tabular() bool[source]
_abc_impl = <_abc_data object>

eds4jinja2.adapters.namespace_handler module

This module deals with namespace management over Pandas DataFrames. - Discovery of prefixes and namespaces in a DataFrame - prefix.cc lookup - Maintenance of a namespace inventory - Shortening of URIs to their QName forms

class NamespaceInventory(namespace_definition_dict=None)[source]

Bases: rdflib.namespace.NamespaceManager

namespaces_as_dict()[source]
Returns

return the namespace definitions as a dict

uri_to_qname(uri_string, prefix_cc_lookup=True, error_fail=False)[source]

Transform the uri_string to a qname string and remember the namespace. If the namespace is not defined, the prefix can be looked up on prefix.cc

Parameters
  • error_fail – whether the errors shall fail hard or just issue a warning

  • prefix_cc_lookup – whether to lookup a namespace on prefix.cc in case it is unknown or not.

  • uri_string – the string of a URI to be reduced to a QName

Returns

qname string

qname_to_uri(qname_string: str, prefix_cc_lookup=True, error_fail=False) str[source]

Transform the QName into an URI

Parameters
  • qname_string – the qname string to be expanded to URI

  • error_fail – whether the errors shall fail hard or just issue a warning

  • prefix_cc_lookup – whetehr to look for missing prefixes at the http://prefix.xx

  • error_fail – shall the error fail hard or pass with a warning

Returns

the absolute URI string

simplify_uris_in_tabular(data_frame: pandas.core.frame.DataFrame, namespace_inventory: eds4jinja2.adapters.namespace_handler.NamespaceInventory, target_columns: Optional[List] = None, prefix_cc_lookup=True, inplace=True, error_fail=True) pandas.core.frame.DataFrame[source]

Replace the full URIs by their qname counterparts. Discover the namespaces in the process, if the namespaces are not defined.

Parameters
  • namespace_inventory – the namespace inventory to be used for replacement resolution

  • error_fail – fail on error or throw exception per data_fame cell

  • inplace – indicate whether the current data_frame shall be modified or a new one be created instead

  • prefix_cc_lookup

  • target_columns – the target columns to explore; Expectation is that these columns exclusively contain only URIs as values

  • data_frame – the dataframe to explore

Returns

the DataFrame with replaced values

eds4jinja2.adapters.prefix_cc_fetcher module

prefix_cc_lookup_prefix(prefix: str) Dict[source]

Lookup a prefix at prefix.cc API and return the base namespace.

prefix_cc_lookup_base_uri(base_uri: str) Dict[source]

Lookup a base namespace on prefix.cc API and return the first prefix (shortest and first in an ordered list). If the base_uri is not in the namespace definitions then return None.

prefix_cc_all() Dict[source]

Return all definitions from the prefix.cc

eds4jinja2.adapters.remote_sparql_ds module

class SPARQLClientPool(*args, **kwargs)[source]

Bases: eds4jinja2.adapters.remote_sparql_ds.SPARQLClientPool

A singleton connection pool, that hosts a dictionary of endpoint_urls and a corresponding SPARQLWrapper object connecting to it.

The rationale of this connection pool is to reuse connection objects and save time.

inited = False
inst = <eds4jinja2.adapters.remote_sparql_ds.SPARQLClientPool object>
classmethod instance()

get the singleton instance

class RemoteSPARQLEndpointDataSource(endpoint_url)[source]

Bases: eds4jinja2.adapters.base_data_source.DataSource

Fetches data from SPARQL endpoint. Can be used either with a SPARQL query or a URI to be described.

To query a SPARQL endpoint and get the results as dict object

>>> ds = RemoteSPARQLEndpointDataSource(sparql_endpoint_url)
>>> dict_object = ds.with_query(sparql_query_text)._fetch_tree()

unpack the content and error for a fail safe fetching >>> dict_object, error_string = ds.with_query(sparql_query_text).fetch_tree()

To describe an URI and get the results as a pandas DataFrame

>>> pd_dataframe = ds.with_uri(existent_uri)._fetch_tree()

unpack the content and error for a fail safe fetching

>>> pd_dataframe, error_string = ds.with_uri(existent_uri).fetch_tree()

In case you want to target URI description from a Named Graph

>>> pd_dataframe, error_string = ds.with_uri(existent_uri,named_graph).fetch_tree()
with_query(sparql_query: str, substitution_variables: Optional[dict] = None, sparql_prefixes: str = '') eds4jinja2.adapters.remote_sparql_ds.RemoteSPARQLEndpointDataSource[source]

Set the query text and return the reference to self for chaining.

Returns

with_query_from_file(sparql_query_file_path: str, substitution_variables: Optional[dict] = None, prefixes: str = '') eds4jinja2.adapters.remote_sparql_ds.RemoteSPARQLEndpointDataSource[source]

Set the query text and return the reference to self for chaining.

Returns

with_uri(uri: str, graph_uri: Optional[str] = None) eds4jinja2.adapters.remote_sparql_ds.RemoteSPARQLEndpointDataSource[source]

Set the query text and return the reference to self for chaining.

Returns

_fetch_tree()[source]

fetch data and return as tree representation

Returns

_fetch_tabular()[source]

fetch data and return as tabular representation

Returns

_can_be_tree() bool[source]
_can_be_tabular() bool[source]
_abc_impl = <_abc_data object>

eds4jinja2.adapters.substitution_template module

class SubstitutionTemplate(template)[source]

Bases: string.Template

delimiter = '~'
pattern = re.compile('\n    \\~(?:\n      (?P<escaped>\\~) |   # Escape sequence of two delimiters\n      (?P<named>(?a:[_a-z][_a-z0-9]*))      |   # delimiter and a Python identifier\n      {(?P<braced>(?a:[_a-z][_a-z0-9, re.IGNORECASE|re.VERBOSE)

eds4jinja2.adapters.tabular_utils module

a set of helper functions to easily polish the tabular data (i.e Pandas DataFrame)

replace_strings_in_tabular(data_frame: pandas.core.frame.DataFrame, target_columns: Optional[List[str]] = None, value_mapping_dict: Optional[Dict] = None, mark_touched_rows: bool = False) List[str][source]

Replaces the values from the target columns in a data frame according to the value-mapping dictionary. If the inverted_mapping flag is true, then the inverted value_mapping_dict is considered. If mark_touched_rows is true, then adds a boolean column _touched_ where

>>> mapping_dict example = {"old value 1" : "new value 1", "old value 2":"new value 2"}
Parameters
  • mark_touched_rows – add a new boolean column _touched_ indicating which rows were updated

  • value_mapping_dict – the string substitution mapping

  • target_columns – a list of column names otehrwise leave empty if substitution applies to all columns

  • data_frame – the data frame

:return the list of unique strings found in the dataframe

add_relative_figures(data_frame: pandas.core.frame.DataFrame, target_columns: List[str], relativisers: List, percentage: bool = True)[source]

For each target_columns add a calculate column with relative values calculated based on the provided relativisers.

Parameters
  • percentage

  • data_frame

  • target_columns

  • relativisers – a list of indicators corresponding to the target_columns comprising either None, a number or a column name

Returns

Module contents

sort_by_size_and_alphabet(l: List) List[source]

Sort an iterable by size and alphabetically

Parameters

l

Returns

first_key(d: typing.Dict, None) object[source]

Return the first dict key that from all the keys ordered first by their length and then alphabetically.

first_key_value(d: typing.Dict, None) object[source]

Return the dict value for the first key in the dict; The first key is determined using first_key function.

invert_dict(mapping_dict: Dict, reduce_values: bool = True)[source]

Invert the dictionary by swapping keys and values. In case the values are unique then the inverted dict will be of the same size as the initial one. Otherwise it will be shrunk to the unique values and the keys will be cumulated in a list.

The list can be reduced to single item by setting reduce_values=True.

>>> d = {"a":1, "b":2, "c":1}
>>> reduced_d = invert_dict(d)
{1: 'a', 2: 'b'}
>>> unreduced_d = invert_dict(d, False)
{1: ['a', 'c'], 2: ['b']}
Parameters

reduce_values – If reduce_values is true then the values are single items otherwise the values are list of possibly multiple items.

deep_update(source, overrides)[source]

Update a nested dictionary or similar mapping. Modify source in place.

Used from https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth