eds4jinja2.adapters package¶
Submodules¶
eds4jinja2.adapters.base_data_source module¶
An abstract data source that is implemented by the specific ones.
- class DataSource[source]¶
Bases:
abc.ABC
A generic data source that fetches data either in tabular or tree representation.
The fail safe run is foreseen by default, to provide entire context back into the Template.
>>> content, error = data_source.fetch_tabular()
Exception prone running shall be performed with the underscored functions
>>> content, error = data_source._fetch_tabular()
To fetch a tree
>>> content, error = data_source.fetch_tree()
- fetch_tabular() Tuple[Optional[object], Optional[str]] [source]¶
Read the content from the data source in tabular structure.
- Returns
a tuple where the first element is a result of a successful data reading and the second is the error message in case of failure
- fetch_tree() Tuple[Optional[object], Optional[str]] [source]¶
Read the content from the data source and return a tree structure.
- Returns
a tuple where the first element is a result of a successful data reading and the second is the error message in case of failure
- _abc_impl = <_abc_data object>¶
eds4jinja2.adapters.file_ds module¶
- class FileDataSource(file_path)[source]¶
Bases:
eds4jinja2.adapters.base_data_source.DataSource
Fetches data from a file. Automatically determines the file type.
Supported tabular file types: “.csv”, “.tsv”, “.xlsx”, “.xls”
Supported tree file types: “.json”, “.yaml”, “.yml”, “.toml”, “.json-ld”, “.jsonld”
To read a CVS
>>> ds = FileDataSource("path/to/the/file.csv") >>> pd_data_frame, error = ds.fetch_tabular()
To read a JSON
>>> ds = FileDataSource("path/to/the/file.json") >>> pd_data_frame, error = ds.fetch_tree()
- property file_path: pathlib.Path¶
The location of the DataSource file
- Returns
- property _file_extension¶
- _abc_impl = <_abc_data object>¶
eds4jinja2.adapters.http_ds module¶
- class HTTPDataSource[source]¶
Bases:
eds4jinja2.adapters.base_data_source.DataSource
Fetching data by HTTP calls
- _abc_impl = <_abc_data object>¶
eds4jinja2.adapters.latex_utils module¶
Latex parsing functionality
This module provides functions to parse latex text
Credits: This module was adapted after the Samuel Roeca’s latex repository (pappasam/latexbuild on github)
eds4jinja2.adapters.local_sparql_ds module¶
- class RDFFileDataSource(filename)[source]¶
Bases:
eds4jinja2.adapters.base_data_source.DataSource
Accesses a local RDF file and provides the possibility to fetch data from it by SPARQL queries.
- __reduce_bound_triple_to_string_format(dict_of_bound_variables: dict)¶
- with_query(sparql_query: str, substitution_variables: Optional[dict] = None, prefixes: str = '') eds4jinja2.adapters.local_sparql_ds.RDFFileDataSource [source]¶
Set the query text and return the reference to self for chaining.
- Returns
- with_query_from_file(sparql_query_file_path: str, substitution_variables: Optional[dict] = None, prefixes: str = '') eds4jinja2.adapters.local_sparql_ds.RDFFileDataSource [source]¶
Set the query text and return the reference to self for chaining.
- Returns
- with_file(file: str) eds4jinja2.adapters.local_sparql_ds.RDFFileDataSource [source]¶
Set the query text and return the reference to self for chaining.
- Returns
- _abc_impl = <_abc_data object>¶
eds4jinja2.adapters.namespace_handler module¶
This module deals with namespace management over Pandas DataFrames. - Discovery of prefixes and namespaces in a DataFrame - prefix.cc lookup - Maintenance of a namespace inventory - Shortening of URIs to their QName forms
- class NamespaceInventory(namespace_definition_dict=None)[source]¶
Bases:
rdflib.namespace.NamespaceManager
- uri_to_qname(uri_string, prefix_cc_lookup=True, error_fail=False)[source]¶
Transform the uri_string to a qname string and remember the namespace. If the namespace is not defined, the prefix can be looked up on prefix.cc
- Parameters
error_fail – whether the errors shall fail hard or just issue a warning
prefix_cc_lookup – whether to lookup a namespace on prefix.cc in case it is unknown or not.
uri_string – the string of a URI to be reduced to a QName
- Returns
qname string
- qname_to_uri(qname_string: str, prefix_cc_lookup=True, error_fail=False) str [source]¶
Transform the QName into an URI
- Parameters
qname_string – the qname string to be expanded to URI
error_fail – whether the errors shall fail hard or just issue a warning
prefix_cc_lookup – whetehr to look for missing prefixes at the http://prefix.xx
error_fail – shall the error fail hard or pass with a warning
- Returns
the absolute URI string
- simplify_uris_in_tabular(data_frame: pandas.core.frame.DataFrame, namespace_inventory: eds4jinja2.adapters.namespace_handler.NamespaceInventory, target_columns: Optional[List] = None, prefix_cc_lookup=True, inplace=True, error_fail=True) pandas.core.frame.DataFrame [source]¶
Replace the full URIs by their qname counterparts. Discover the namespaces in the process, if the namespaces are not defined.
- Parameters
namespace_inventory – the namespace inventory to be used for replacement resolution
error_fail – fail on error or throw exception per data_fame cell
inplace – indicate whether the current data_frame shall be modified or a new one be created instead
prefix_cc_lookup –
target_columns – the target columns to explore; Expectation is that these columns exclusively contain only URIs as values
data_frame – the dataframe to explore
- Returns
the DataFrame with replaced values
eds4jinja2.adapters.prefix_cc_fetcher module¶
- prefix_cc_lookup_prefix(prefix: str) Dict [source]¶
Lookup a prefix at prefix.cc API and return the base namespace.
eds4jinja2.adapters.remote_sparql_ds module¶
- class SPARQLClientPool(*args, **kwargs)[source]¶
Bases:
eds4jinja2.adapters.remote_sparql_ds.SPARQLClientPool
A singleton connection pool, that hosts a dictionary of endpoint_urls and a corresponding SPARQLWrapper object connecting to it.
The rationale of this connection pool is to reuse connection objects and save time.
- inited = False¶
- inst = <eds4jinja2.adapters.remote_sparql_ds.SPARQLClientPool object>¶
- classmethod instance()¶
get the singleton instance
- class RemoteSPARQLEndpointDataSource(endpoint_url)[source]¶
Bases:
eds4jinja2.adapters.base_data_source.DataSource
Fetches data from SPARQL endpoint. Can be used either with a SPARQL query or a URI to be described.
To query a SPARQL endpoint and get the results as dict object
>>> ds = RemoteSPARQLEndpointDataSource(sparql_endpoint_url) >>> dict_object = ds.with_query(sparql_query_text)._fetch_tree()
unpack the content and error for a fail safe fetching >>> dict_object, error_string = ds.with_query(sparql_query_text).fetch_tree()
To describe an URI and get the results as a pandas DataFrame
>>> pd_dataframe = ds.with_uri(existent_uri)._fetch_tree()
unpack the content and error for a fail safe fetching
>>> pd_dataframe, error_string = ds.with_uri(existent_uri).fetch_tree()
In case you want to target URI description from a Named Graph
>>> pd_dataframe, error_string = ds.with_uri(existent_uri,named_graph).fetch_tree()
- with_query(sparql_query: str, substitution_variables: Optional[dict] = None, sparql_prefixes: str = '') eds4jinja2.adapters.remote_sparql_ds.RemoteSPARQLEndpointDataSource [source]¶
Set the query text and return the reference to self for chaining.
- Returns
- with_query_from_file(sparql_query_file_path: str, substitution_variables: Optional[dict] = None, prefixes: str = '') eds4jinja2.adapters.remote_sparql_ds.RemoteSPARQLEndpointDataSource [source]¶
Set the query text and return the reference to self for chaining.
- Returns
- with_uri(uri: str, graph_uri: Optional[str] = None) eds4jinja2.adapters.remote_sparql_ds.RemoteSPARQLEndpointDataSource [source]¶
Set the query text and return the reference to self for chaining.
- Returns
- _abc_impl = <_abc_data object>¶
eds4jinja2.adapters.substitution_template module¶
- class SubstitutionTemplate(template)[source]¶
Bases:
string.Template
- delimiter = '~'¶
- pattern = re.compile('\n \\~(?:\n (?P<escaped>\\~) | # Escape sequence of two delimiters\n (?P<named>(?a:[_a-z][_a-z0-9]*)) | # delimiter and a Python identifier\n {(?P<braced>(?a:[_a-z][_a-z0-9, re.IGNORECASE|re.VERBOSE)¶
eds4jinja2.adapters.tabular_utils module¶
a set of helper functions to easily polish the tabular data (i.e Pandas DataFrame)
- replace_strings_in_tabular(data_frame: pandas.core.frame.DataFrame, target_columns: Optional[List[str]] = None, value_mapping_dict: Optional[Dict] = None, mark_touched_rows: bool = False) List[str] [source]¶
Replaces the values from the target columns in a data frame according to the value-mapping dictionary. If the inverted_mapping flag is true, then the inverted value_mapping_dict is considered. If mark_touched_rows is true, then adds a boolean column _touched_ where
>>> mapping_dict example = {"old value 1" : "new value 1", "old value 2":"new value 2"}
- Parameters
mark_touched_rows – add a new boolean column _touched_ indicating which rows were updated
value_mapping_dict – the string substitution mapping
target_columns – a list of column names otehrwise leave empty if substitution applies to all columns
data_frame – the data frame
:return the list of unique strings found in the dataframe
- add_relative_figures(data_frame: pandas.core.frame.DataFrame, target_columns: List[str], relativisers: List, percentage: bool = True)[source]¶
For each target_columns add a calculate column with relative values calculated based on the provided relativisers.
- Parameters
percentage –
data_frame –
target_columns –
relativisers – a list of indicators corresponding to the target_columns comprising either None, a number or a column name
- Returns
Module contents¶
- sort_by_size_and_alphabet(l: List) List [source]¶
Sort an iterable by size and alphabetically
- Parameters
l –
- Returns
- first_key(d: typing.Dict, None) object [source]¶
Return the first dict key that from all the keys ordered first by their length and then alphabetically.
- first_key_value(d: typing.Dict, None) object [source]¶
Return the dict value for the first key in the dict; The first key is determined using first_key function.
- invert_dict(mapping_dict: Dict, reduce_values: bool = True)[source]¶
Invert the dictionary by swapping keys and values. In case the values are unique then the inverted dict will be of the same size as the initial one. Otherwise it will be shrunk to the unique values and the keys will be cumulated in a list.
The list can be reduced to single item by setting reduce_values=True.
>>> d = {"a":1, "b":2, "c":1} >>> reduced_d = invert_dict(d) {1: 'a', 2: 'b'}
>>> unreduced_d = invert_dict(d, False) {1: ['a', 'c'], 2: ['b']}
- Parameters
reduce_values – If reduce_values is true then the values are single items otherwise the values are list of possibly multiple items.
- deep_update(source, overrides)[source]¶
Update a nested dictionary or similar mapping. Modify
source
in place.Used from https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth