Skip to content

feat: combine go module file and go source discovery into single cataloger #4127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Aug 11, 2025

Description

This PR enhances the GoModuleFileCataloger with the following improvements:

  • Added golang.org/x/tools/go/packages integration - The cataloger now uses packages.Load with the "all" pattern to load complete dependency information when the Go toolchain is available for all go.mod files found for a given dir target scan
  • modules are then used in combination with packages.Visit to build the full import graph to get a complete picture of the project's dependencies. It allows loading of go module license files from the mod cache into the license classifier for better license accuracy (see below notes).
  • We also are able to extract the main module for given go.mod findings which is an improvement over the default process which just reads the required block (see below notes)
  • If the Go toolchain is not available or fails to load packages, the cataloger falls back to the original go.mod parsing behavior.

Limitations

  • The implementation first tries to use the Go toolchain, then fills in any missing packages from the go.mod file to ensure no dependencies are missed. This accounts for dependencies that might be locked behind build flags, architecture, os, or build tags that are not compatible with the current system, and other environment specific things that could affect what modules the go tooling chooses to reveal on dependency tree inspection using go list

The key changes were made to syft/syft/pkg/cataloger/golang/parse_go_mod.go

The enhancement provides more detailed dependency information when possible while maintaining backward compatibility with systems that don't have access to go list commands.

Notes

This PR was tested against multiple repositories to illustrate the improvements made to the cataloger:

caddy/
client/
consul/
cue/
flux2/
go-containerregistry/
go-struct-converter/
golangci-lint/
kind/
minio/
prometheus/
stereoscope/

Improved Package Discovery:

For things like go-containerregistry new modules are now discovered because of the source analysis:

syft -o json --override-default-catalogers go-module-file-cataloger dir:../go-source-test/repos/go-containerregistry > a.json
go run cmd/syft/main.go -o json --override-default-catalogers go-module-file-cataloger dir:../go-source-test/repos/go-containerregistry/ > b.json

jq -s '
 (.[0].artifacts | map(.name) // []) as $A |
 (.[1].artifacts | map(.name) // []) as $B |
 {
   only_syft:   (($A - $B) | unique | sort),
   only_go_run: (($B - $A) | unique | sort)
 }
' a.json b.json
{
  "only_syft": [],
  "only_go_run": [
    "github.com/google/go-containerregistry/cmd/krane",
    "github.com/google/go-containerregistry/pkg/authn/k8schain"
  ]
}

This above command can be run for all the test repositories to show new modules being included in the results.

Improved License Discovery

For a project like minio syft currently has issues discovering any of the licenses when given a go.mod file as part of the cataloging process

syft -o json --override-default-catalogers go-module-file-cataloger dir:../go-source-test/repos/minio | jq '.artifacts | [ .[] | select(.licenses == []) ] | length'
 ✔ Indexed file system                                                                                                                                                                 ../go-source-test/repos/minio
 ✔ Cataloged contents                                                                                                                               e1b7a8e4e8802913e7811ccc122df8d58a539e901b87e2b1f170a2acfd60b9dd
   ├── ✔ Packages                        [295 packages]
   ├── ✔ Executables                     [0 executables]
   ├── ✔ File metadata                   [5 locations]
   └── ✔ File digests                    [5 files]
295

With this PR we can see this number is down to 29

go run cmd/syft/main.go -o json --override-default-catalogers go-module-file-cataloger dir:../go-source-test/repos/minio | jq '.artifacts | [ .[] | select(.licenses == []) ] | length'
 ✔ Indexed file system                                                                                                                                                                 ../go-source-test/repos/minio
 ✔ Cataloged contents                                                                                                                               e1b7a8e4e8802913e7811ccc122df8d58a539e901b87e2b1f170a2acfd60b9dd
   ├── ✔ Packages                        [301 packages]
   ├── ✔ Executables                     [0 executables]
   ├── ✔ File metadata                   [6 locations]
   └── ✔ File digests                    [6 files]
29

Similar improvements in license detection and package discovery can be seen for the listed repositories.

Most of the packages that still do not have licenses can be attributed to the packages that are still necessarily discoverable only via the go.mod file. A small percentage also have licenses that do not return a result from the classifier.

The packages that still need to be discovered via the go.mod file can be because of things like build flags, build tags (including os, arch, or more complex header declarations), or other go build settings that are not set for a given environment when the underlying tools run go list

go run cmd/syft/main.go -o json --override-default-catalogers go-module-file-cataloger dir:../go-source-test/repos/minio |
jq '{
 go_source: ([.artifacts[] | select(.type=="go-source")] | length),
 go_module: ([.artifacts[] | select(.type=="go-module")] | length)
}'

 ✔ Indexed file system                                                                                                                                                                 ../go-source-test/repos/minio
 ✔ Cataloged contents                                                                                                                               e1b7a8e4e8802913e7811ccc122df8d58a539e901b87e2b1f170a2acfd60b9dd
   ├── ✔ Packages                        [301 packages]
   ├── ✔ Executables                     [0 executables]
   ├── ✔ File metadata                   [6 locations]
   └── ✔ File digests                    [6 files]
{
  "go_source": 281,
  "go_module": 20
}

Potential Breaking Consideration

The go source package type was added here partially to show a difference of the origin for the packages given the new cataloging process. I'm not sure if this constitutes a breaking change or not. If this PR were to merge as is then users WOULD see a different output for identical scan targets between latest syft and the next release. This is because packages previously cataloged as go-module would be 'upgraded' to go-source given this new behavior.

I left the metadata as the same between the two and was able to merge the digest information from go-sum with the go-source information. If we want we can just remove the go_source type and leave these all as go_modules that might be the best way forward.

I've removed the configuration options for now for test and searchPattern

The true and all respective values for the above are more inline with syft's philosophy on cataloging ALL the things.

Feature

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation (updates the documentation)

Checklist:

  • Documentation needs to be updated
  • I have added unit tests that cover changed behavior <---
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

Signed-off-by: Christopher Phillips <[email protected]>
@spiffcs spiffcs changed the title feat: merge mod and source into single cataloger feat: combine mod and source into single cataloger Aug 11, 2025
@spiffcs spiffcs changed the title feat: combine mod and source into single cataloger feat: combine go module file and go source discovery into single cataloger Aug 11, 2025
@spiffcs spiffcs requested a review from wagoodman August 11, 2025 13:56
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
@spiffcs
Copy link
Contributor Author

spiffcs commented Aug 12, 2025

Also fixes: #3451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Catalog entire build list for Go projects, not just packages listed in go.mod
1 participant