Tracing SDK How-to Guides
Render Trace inside Jupyter Notebook
Jupyter integration is available in MLflow 2.20 and above
The trace UI is also available within Jupyter notebooks!
This feature requires using an MLflow Tracking Server, as
this is where the UI assets are fetched from. To get started, simply ensure that the MLflow
Tracking URI is set to your tracking server (e.g. mlflow.set_tracking_uri("http://localhost:5000")
).
By default, the trace UI will automatically be displayed for the following events:
- When the cell code generates a trace (e.g. via automatic tracing, or by running a manually traced function)
- When
mlflow.search_traces()
is called - When a
mlflow.entities.Trace()
object is displayed (e.g. via IPython'sdisplay
function, or when it is the last value returned in a cell)
To disable the display, simply call mlflow.tracing.disable_notebook_display()
, and rerun the cell
containing the UI. To enable it again, call mlflow.tracing.enable_notebook_display()
.
For a more complete example, try running this demo notebook!
Manually Creating a Trace and a Span
Please refer to the Manual Tracing guide for how to create a trace and span manually.
Setting Trace Tags
Tags can be added to traces to provide additional metadata at the trace level. For example, you can attach a session ID to a trace to group traces by a conversation session. MLflow provides APIs to set and delete tags on traces. Select the right API based on whether you want to set tags on an active trace or on an already finished trace.
API / Method | Use Case |
---|---|
Setting tags on an active trace during the code execution. | |
Programmatically setting tags on a finished trace. | |
MLflow UI | Setting tags on a finished trace conveniently. |
Setting Tags on an Active Trace
If you are using automatic tracing or fluent APIs to create traces and want to add tags to the trace during its execution, you can use the mlflow.update_current_trace()
function.
For example, the following code example adds the "fruit": "apple"
tag to the trace created for the my_func
function:
@mlflow.trace
def my_func(x):
mlflow.update_current_trace(tags={"fruit": "apple"})
return x + 1
The mlflow.update_current_trace()
function adds the specified tag(s) to the current trace when the key is not already present. If the key is already present, it updates the key with the new value.
Setting Tags on a Finished Trace
To set tags on a trace that has already been completed and logged in the backend store, use the MlflowClient.set_trace_tag
method to set a tag on a trace,
and the MlflowClient.delete_trace_tag
method to remove a tag from a trace.
# Set a tag on a trace
client.set_trace_tag(request_id=request_id, key="tag_key", value="tag_value")
# Delete a tag from a trace
client.delete_trace_tag(request_id=request_id, key="tag_key")
Setting Tags via the MLflow UI
Alternatively, you can update or delete tags on a trace from the MLflow UI. To do this, navigate to the trace tab, then click on the pencil icon next to the tag you want to update.
Delete Traces
You can delete traces based on specific criteria using the MlflowClient.delete_traces
method. This method allows you to delete traces by experiment ID,
maximum timestamp, or request IDs.
Deleting a trace is an irreversible process. Ensure that the setting provided within the delete_traces
API meet the intended range for deletion.
import time
# Get the current timestamp in milliseconds
current_time = int(time.time() * 1000)
# Delete traces older than a specific timestamp
deleted_count = client.delete_traces(
experiment_id="1", max_timestamp_millis=current_time, max_traces=10
)
Disabling Traces
To disable tracing, the mlflow.tracing.disable()
API will cease the collection of trace data from within MLflow and will not log
any data to the MLflow Tracking service regarding traces.
To enable tracing (if it had been temporarily disabled), the mlflow.tracing.enable()
API will re-enable tracing functionality for instrumented models
that are invoked.
Associating Traces to MLflow Run
If a trace is generated within a run context, the recorded traces to an active Experiment will be associated with the active Run.
For example, in the following code, the traces are generated within the start_run
context.
import mlflow
# Create and activate an Experiment
mlflow.set_experiment("Run Associated Tracing")
# Start a new MLflow Run
with mlflow.start_run() as run:
# Initiate a trace by starting a Span context from within the Run context
with mlflow.start_span(name="Run Span") as parent_span:
parent_span.set_inputs({"input": "a"})
parent_span.set_outputs({"response": "b"})
parent_span.set_attribute("a", "b")
# Initiate a child span from within the parent Span's context
with mlflow.start_span(name="Child Span") as child_span:
child_span.set_inputs({"input": "b"})
child_span.set_outputs({"response": "c"})
child_span.set_attributes({"b": "c", "c": "d"})
When navigating to the MLflow UI and selecting the active Experiment, the trace display view will show the run that is associated with the trace, as well as providing a link to navigate to the run within the MLflow UI. See the below video for an example of this in action.
You can also programmatically retrieve the traces associated to a particular Run by using the mlflow.client.MlflowClient.search_traces()
method.
from mlflow import MlflowClient
client = MlflowClient()
# Retrieve traces associated with a specific Run
traces = client.search_traces(run_id=run.info.run_id)
print(traces)
Logging Traces Asynchronously
By default, MLflow Traces are logged synchronously. This may introduce a performance overhead when logging Traces, especially when your MLflow Tracking Server is running on a remote server. If the performance overhead is a concern for you, you can enable asynchronous logging for tracing in MLflow 2.16.0 and later.
To enable async logging for tracing, call mlflow.config.enable_async_logging()
in your code. This will make the trace logging operation non-blocking and reduce the performance overhead.
import mlflow
mlflow.config.enable_async_logging()
# Traces will be logged asynchronously
with mlflow.start_span(name="foo") as span:
span.set_inputs({"a": 1})
span.set_outputs({"b": 2})
# If you don't see the traces in the UI after waiting for a while, you can manually flush the traces
# mlflow.flush_trace_async_logging()
Note that the async logging does not fully eliminate the performance overhead. Some backend calls still need to be made synchronously and there are other factors such as data serialization. However, async logging can significantly reduce the overall overhead of logging traces, empirically about ~80% for typical workloads.
The following configurations are available for async logging:
Environment Variable | Description | Default Value |
---|---|---|
MLFLOW_ASYNC_TRACE_LOGGING_MAX_WORKERS | The maximum number of worker threads to use for async trace logging. | 10 |
MLFLOW_ASYNC_TRACE_LOGGING_MAX_QUEUE_SIZE | The maximum number of traces that can be queued for logging. Traces will be discarded if the queue is full. | 1000 |
MLFLOW_ASYNC_TRACE_LOGGING_RETRY_TIMEOUT | Timeout in seconds for retrying failed trace logging. Namely, failed traces will be retried up to this timeout with backoff, after which they will be discarded. | 60 |