Using the aimpf library
Last updated: September 04, 2024
Contextualize has created a library for users to more easily access files uploaded through the MQTT pipeline to the db38 database. Features of this library are in development and this guide will be updated frequently as new features are added. This guide will help you configure your environment, authenticate with CartaAgent, and use the dispatcher to interact with data in the db38 database. There is also a low-code/no-code way of using this library available on Seven Bridges.
Clone Repository
The aimpf library is available through Contextualize on Gitlab: AIMPF Library. At this time, the most recent updates are available in the chen-dev branch but a pipeline to merging these changes into main will be developed. Please note the use of this chen-dev branch now, and as described in the installation in the next step. A second Contextualize-backed repository is a necessary dependence for this package to perform and is also available on Gitlab: pycarta
Configure environment
Experienced users may have their own desired virtual environment methods and management and are welcome to continue with their own methods.
To configure the virtual env, run the following commands in terminal in the same directory as this file:
Linux/macOS:
virtualenv venv -p 3.11 # Create a virtual environment using python3.11
source venv/bin/activate # Activate the environment
pip install git+https://gitlab.com/contextualize/pycarta.git # Install pycarta
pip install git+https://gitlab.com/contextualize-public/aimpf/aimpf.git@chen-dev # Install aimpf
Powershell:
virtualenv -p 3.11 venv # Create a virtual environment using python3.11
venv/Scripts/activate # Activate the environment
pip install git+https://gitlab.com/contextualize/pycarta.git # Install pycarta
pip install git+https://gitlab.com/contextualize-public/aimpf/aimpf.git@chen-dev # Install aimpf
Note: As a reminder, it is generally recommended to deactivate and reactivate the virtual environment after installing packages and before continuing.
Authenticate with CartaAgent
Before you try to authenticate with CartaAgent, ensure you have a Carta account (at this time, as a development/sandbox user). If you do not have an account, or only have a productive-level account, please email customer.support@contextualize.us.com.
Note
At any point after confirming the username and password, a user can choose to use the command line tool built off of the functions described through the rest of this case study. To explore the command line tool, run the below command from your desired terminal to access further help instructions.
db38pulling --help
While there are a number of ways to authenticate, the example below uses a “.env” file storing the users credentials as variables “CARTA_ROOT_USERNAME” and “CARTA_ROOT_PASSWORD”. You may update the example below to use your preferred method of using secrets, understanding that they must get passed to the CartaAgent as ‘username’ and ‘password’.
Note: The code described from this point on is also available in the repository in Jupyter Notebooks within the ‘tests’ directory.
import os
from pycarta.auth import CartaAgent
from aimpf.dispatcher.mysql import *
# Create authenticated CartaAgent object, which will be used to access the Carta services in the following steps.
# The CARTA_ROOT_USERNAME and CARTA_ROOT_PASSWORD are the credentials of the root user of the Carta account. We suggest that the credentials be stored in the .env file located in the root directory of the project.
# The url is optional and the default value are shown below.
agent = CartaAgent(
username=os.getenv("CARTA_ROOT_USERNAME"),
password=os.getenv("CARTA_ROOT_PASSWORD"),
url="https://api.sandbox.carta.contextualize.us.com"
)
Data hierarchy
Before proceeding, we should get the basic idea about the structure of a typical MySQL data resource.
1. Resources
Resources refer to the overall database server, which can manage multiple databases. A resource encompasses the server hardware, the MySQL software instance, and all databases managed by that instance. It’s the top-level entity in the hierarchy.
2. Databases
A database is a collection of related data organized in a structured way. It is a logical container that holds various tables, views, stored procedures, functions, and other objects. In MySQL, each database is a separate entity that can contain multiple tables and other structures.
3. Tables
A table is a collection of related data entries and is the fundamental building block of a database. Tables are organized into rows and columns. Each table represents a specific entity (e.g., customers, orders, products) and contains data about that entity. Tables are defined by their schema, which includes the table name and the columns that comprise the table.
4. Columns
A column (or field) is a single piece of data in a table and represents a specific attribute of the entity described by the table. Each column has a name and a data type (e.g., INTEGER, VARCHAR, DATE). Columns define the structure of the table by specifying the kind of data that can be stored in each position.
5. Records
A record (or row) is a single entry in a table and represents a single instance of the entity described by the table. Each record contains data corresponding to one row of columns in the table. Records hold the actual data that populates the table, with each value in a record matching the data type specified by the corresponding column.
Using the dispatcher
Initialize the dispatcher
There are two ways of using the dispatcher to access the data available: by using a direct URL, or by using a registered Carta service.
- Method 1: Using dispatcher URL
This is usually during the development stage, since the host address is usually unknown to the user, and at the time of this case study is the method to follow.
# Method 1: Initialize the dispatcher by providing the agent and the dispatcher URL
host = "ow1ygc0epd.execute-api.us-east-2.amazonaws.com/prod"
dispatcher_url = f"https://{host}"
dbs = DbSubscriber(agent, url=dispatcher_url)
# DbSubscriber is the base class for the classes below
ctxt = Ctxt(agent, url=dispatcher_url)
db41 = Db41(agent, url=dispatcher_url)
db38 = Db38(agent, url=dispatcher_url)
- Method 2: Using the registered Carta service
This is typically recommended and should be used when services are registered
# Method 2: Using the registered Carta service, note that both the namespace and the service are required, and only users with the correct permissions can access the service. In this case the birdshot user does not have the permission while the carta root and admin users have.
dbs = DbSubscriber(agent, namespace="aimpf", service="mysql")
dbs.is_alive() # If one used a user does not have the permission to access the service, there would be an error
# DbSubscriber is the base class for the classes below
ctxt = Ctxt(agent, namespace="aimpf", service="mysql")
db41 = Db41(agent, namespace="aimpf", service="mysql")
db38 = Db38(agent, namespace="aimpf", service="mysql")
Using Dispatcher Functions
Here are some common functions you can use to interact with the database:
Check Status
# These functions do not need the resources to be specified
dbs.is_alive()
# dbs.keywords(columns=["dataItemId", "dateTime"], where="assetId=Okuma-4020", start="2020-01-01", end="2022-12-31", limit=5)
Database metadata
subs.list_resources() # List all resources
# ctxt.list_databases() # List all databases for the resource "ctxt"
# ctxt.list_columns("ctxt", "Messages") # List all columns for the table "Messages" in the database "ctxt"
ctxt.list_tables(database="ctxt") # List all tables in the database "ctxt"
Distinct values
ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId")
# ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId", limit=3)
# ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId", start="2020-01-01", end="2024-04-14")
# ctxt.distinct(database="ctxt", table="Messages_all", columns="assetId", start="2020-01-01", end="2022-12-31", limit=5)
# ctxt.distinct(database="ctxt", table="Messages_all", columns=["dataItemId", "dateTime"], where="assetId=Okuma-4020", start="2020-01-01", end="2022-12-31", limit=10)
Count records
# ctxt.count(database="ctxt", table="Messages_all")
db38.count(database="ProcessData", table="PositionData_Fanuc")
# ctxt.count(database="ctxt", table="Messages_all", where="assetId=Okuma-4020")
# ctxt.count(database="ctxt", table="Messages_all", where="assetId=Okuma-4020", start="2020-01-01", end="2022-12-31")
List records
# ctxt.list(database="ctxt", table="Messages_all", limit=50)
# ctxt.list(database="ctxt", table="Messages_all", start="2022-09-30T17:53:00", limit=15)
# ctxt.list(database="ctxt", table="Messages_all", end="2022-12-31", limit=15)
# ctxt.list(database="ctxt", table="Messages_all", where="assetId=Okuma-4020", limit=20)
# ctxt.list(database="ctxt", table="Messages_all", where="assetId=Okuma-4020", start="2020-08-16", end="2024-12-31", limit=10)
ctxt.list(database="ctxt", table="Messages", where="Id<=913,dataItemId=Heartbeat", limit=10)