Implement framework to validate backwards compatibility of metrics

Jaeger binaries produce various metrics that can be used to monitor Jaeger itself in production, such as throughput, queue saturation, error rates, etc. We historically treated those metrics as a stable public API (we even provide a [Grafana dashboard mixin](https://github.com/jaegertracing/jaeger/tree/main/monitoring/jaeger-mixin)), but we never actually had proper integration tests that validate that changes to the code do not break the metrics.

## Proposal

We can enhance our existing integration tests to also perform comparison of the metrics. The full set of all possible metrics is never available from a single run because if some components are not utilized (e.g. adaptive sampling or a particular storage type) then their metrics will not be registered. However, since our integration tests cover most of the components, we can scape the metrics endpoint at the end of each test and combine them into a full picture.

Caveat: the exact shape of the metrics depends on all the nested namespaces that could be applied to the `metrics.Factory`, so it is sensitive to the exact code in the main functions, which is where `metrics.Factory` always originates. Our [integration tests for Jaeger v2][testsv2] usually use the actual binary, so the resulting metrics will reflect how that binary performs in production. But all [integration tests for v1][testsv1] are run from a unit testing framework and the `metrics.Factory` initialization may not match how it's done in the `main`s. So we may only be able to solve this for Jaeger v2, which is fine.

[testsv2]: https://github.com/jaegertracing/jaeger/tree/main/cmd/jaeger/internal/integration
[testsv1]: https://github.com/jaegertracing/jaeger/tree/main/plugin/storage/integration

## Approach

* At the end of each integration test (CIT) workflow we scrape the metrics collected by the binary and upload them as a github artifact
* Then the final workflow can gather all those artifacts and compare them with similar reports from the latest release. If differences are found it can upload them as another artifact and link to the PR so that maintainers can inspect the changes and decide if they are acceptable.
* The artifacts uploaded for the official release can also be referenced from the documentation website as a way of documenting the current collection of metrics.

## Help Wanted

We seek community help to implement this functionality. This is not a 30min fix, but we still marked it as `good-first-issue` because it can be done incrementally.

## Tasks

* [x] currently all CIT workflows are independent. In order to be able to have a final job once all CIT jobs are finished we may need to combine them all into a single CIT workflow with multiple jobs
* [x] the ability to scrape and compare metrics was implemented in #5941. We need to integrate the scraping into each CIT workflow (using a helper script) and upload the output as workflow artifacts
* [ ] implement the validation job in a workflow that would compare artifacts from the current PR with those from the latest release and generate a diff (also uploaded as a separate artifact)
* [ ] make the validation job post some form of summary as a comment to the PR (or as the output of the Check)
* [ ] implement a way to incorporate the metrics report into documentation website



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement framework to validate backwards compatibility of metrics #6278

Proposal

Approach

Help Wanted

Tasks

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement framework to validate backwards compatibility of metrics #6278

Description

Proposal

Approach

Help Wanted

Tasks

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions