Using the aimpf library

Last updated: September 04, 2024

Contextualize has created a library for users to more easily access files uploaded through the MQTT pipeline to the db38 database. Features of this library are in development and this guide will be updated frequently as new features are added. This guide will help you configure your environment, authenticate with CartaAgent, and use the dispatcher to interact with data in the db38 database. There is also a low-code/no-code way of using this library available on Seven Bridges.

Clone Repository

The aimpf library is available through Contextualize on Gitlab: AIMPF Library. At this time, the most recent updates are available in the chen-dev branch but a pipeline to merging these changes into main will be developed. Please note the use of this chen-dev branch now, and as described in the installation in the next step. A second Contextualize-backed repository is a necessary dependence for this package to perform and is also available on Gitlab: pycarta

Configure environment

Experienced users may have their own desired virtual environment methods and management and are welcome to continue with their own methods.

To configure the virtual env, run the following commands in terminal in the same directory as this file:

Linux/macOS:

virtualenv venv -p 3.11  # Create a virtual environment using python3.11
source venv/bin/activate  # Activate the environment
pip install git+https://gitlab.com/contextualize/pycarta.git  # Install pycarta
pip install git+https://gitlab.com/contextualize-public/aimpf/aimpf.git@chen-dev  # Install aimpf

Powershell:

virtualenv -p 3.11 venv # Create a virtual environment using python3.11
venv/Scripts/activate  # Activate the environment
pip install git+https://gitlab.com/contextualize/pycarta.git  # Install pycarta
pip install git+https://gitlab.com/contextualize-public/aimpf/aimpf.git@chen-dev  # Install aimpf

Note: As a reminder, it is generally recommended to deactivate and reactivate the virtual environment after installing packages and before continuing.

Authenticate with CartaAgent

Before you try to authenticate with CartaAgent, ensure you have a Carta account (at this time, as a development/sandbox user). If you do not have an account, or only have a productive-level account, please email customer.support@contextualize.us.com.

Note

At any point after confirming the username and password, a user can choose to use the command line tool built off of the functions described through the rest of this case study. To explore the command line tool, run the below command from your desired terminal to access further help instructions.

db38pulling --help

While there are a number of ways to authenticate, the example below uses a “.env” file storing the users credentials as variables “CARTA_ROOT_USERNAME” and “CARTA_ROOT_PASSWORD”. You may update the example below to use your preferred method of using secrets, understanding that they must get passed to the CartaAgent as ‘username’ and ‘password’.

Note: The code described from this point on is also available in the repository in Jupyter Notebooks within the ‘tests’ directory.

import os
from pycarta.auth import CartaAgent
from aimpf.dispatcher.mysql import *

# Create authenticated CartaAgent object, which will be used to access the Carta services in the following steps.
# The CARTA_ROOT_USERNAME and CARTA_ROOT_PASSWORD are the credentials of the root user of the Carta account. We suggest that the credentials be stored in the .env file located in the root directory of the project.
# The url is optional and the default value are shown below.
agent = CartaAgent(
    username=os.getenv("CARTA_ROOT_USERNAME"),
    password=os.getenv("CARTA_ROOT_PASSWORD"),
    url="https://api.sandbox.carta.contextualize.us.com"
)

Data hierarchy

Before proceeding, we should get the basic idea about the structure of a typical MySQL data resource.

1. Resources

Resources refer to the overall database server, which can manage multiple databases. A resource encompasses the server hardware, the MySQL software instance, and all databases managed by that instance. It’s the top-level entity in the hierarchy.

2. Databases

A database is a collection of related data organized in a structured way. It is a logical container that holds various tables, views, stored procedures, functions, and other objects. In MySQL, each database is a separate entity that can contain multiple tables and other structures.

3. Tables

A table is a collection of related data entries and is the fundamental building block of a database. Tables are organized into rows and columns. Each table represents a specific entity (e.g., customers, orders, products) and contains data about that entity. Tables are defined by their schema, which includes the table name and the columns that comprise the table.

4. Columns

A column (or field) is a single piece of data in a table and represents a specific attribute of the entity described by the table. Each column has a name and a data type (e.g., INTEGER, VARCHAR, DATE). Columns define the structure of the table by specifying the kind of data that can be stored in each position.

5. Records

A record (or row) is a single entry in a table and represents a single instance of the entity described by the table. Each record contains data corresponding to one row of columns in the table. Records hold the actual data that populates the table, with each value in a record matching the data type specified by the corresponding column.

Using the dispatcher

Initialize the dispatcher

There are two ways of using the dispatcher to access the data available: by using a direct URL, or by using a registered Carta service.

Method 1: Using dispatcher URL

This is usually during the development stage, since the host address is usually unknown to the user, and at the time of this case study is the method to follow.

# Method 1: Initialize the dispatcher by providing the agent and the dispatcher URL
host = "ow1ygc0epd.execute-api.us-east-2.amazonaws.com/prod"
dispatcher_url = f"https://{host}"
dbs = DbSubscriber(agent, url=dispatcher_url)

# DbSubscriber is the base class for the classes below
ctxt = Ctxt(agent, url=dispatcher_url)
db41 = Db41(agent, url=dispatcher_url)
db38 = Db38(agent, url=dispatcher_url)
Method 2: Using the registered Carta service

This is typically recommended and should be used when services are registered

# Method 2: Using the registered Carta service, note that both the namespace and the service are required, and only users with the correct permissions can access the service. In this case the birdshot user does not have the permission while the carta root and admin users have.
dbs = DbSubscriber(agent, namespace="aimpf", service="mysql")
dbs.is_alive()  # If one used a user does not have the permission to access the service, there would be an error

# DbSubscriber is the base class for the classes below
ctxt = Ctxt(agent, namespace="aimpf", service="mysql")
db41 = Db41(agent, namespace="aimpf", service="mysql")
db38 = Db38(agent, namespace="aimpf", service="mysql")

Using Dispatcher Functions

Here are some common functions you can use to interact with the database:

Check Status

# These functions do not need the resources to be specified
dbs.is_alive()
# dbs.keywords(columns=["dataItemId", "dateTime"], where="assetId=Okuma-4020", start="2020-01-01", end="2022-12-31", limit=5)

Database metadata

subs.list_resources()  # List all resources
# ctxt.list_databases()  # List all databases for the resource "ctxt"
# ctxt.list_columns("ctxt", "Messages")  # List all columns for the table "Messages" in the database "ctxt"
ctxt.list_tables(database="ctxt")  # List all tables in the database "ctxt"

Distinct values

ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId")
# ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId", limit=3)
# ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId", start="2020-01-01", end="2024-04-14")
# ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId", start="2020-01-01", end="2022-12-31", limit=5)
# ctxt.distinct(database="ctxt", table="Messages_all", columns=["dataItemId", "dateTime"], where="assetId=Okuma-4020", start="2020-01-01", end="2022-12-31", limit=10)

Count records

# ctxt.count(database="ctxt", table="Messages_all")
db38.count(database="ProcessData", table="PositionData_Fanuc")
# ctxt.count(database="ctxt", table="Messages_all", where="assetId=Okuma-4020")
# ctxt.count(database="ctxt", table="Messages_all", where="assetId=Okuma-4020", start="2020-01-01", end="2022-12-31")

List records

# ctxt.list(database="ctxt", table="Messages_all", limit=50)
# ctxt.list(database="ctxt", table="Messages_all", start="2022-09-30T17:53:00", limit=15)
# ctxt.list(database="ctxt", table="Messages_all", end="2022-12-31", limit=15)
# ctxt.list(database="ctxt", table="Messages_all", where="assetId=Okuma-4020", limit=20)
# ctxt.list(database="ctxt", table="Messages_all", where="assetId=Okuma-4020", start="2020-08-16", end="2024-12-31", limit=10)
ctxt.list(database="ctxt", table="Messages", where="Id<=913,dataItemId=Heartbeat", limit=10)