feat: combine go module file and go source discovery into single cataloger #4127
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR enhances the GoModuleFileCataloger with the following improvements:
go.mod
files found for a givendir
target scango.mod
findings which is an improvement over the default process which just reads the required block (see below notes)Limitations
go list
The key changes were made to
syft/syft/pkg/cataloger/golang/parse_go_mod.go
The enhancement provides more detailed dependency information when possible while maintaining backward compatibility with systems that don't have access to
go list
commands.Notes
This PR was tested against multiple repositories to illustrate the improvements made to the cataloger:
Improved Package Discovery:
For things like
go-containerregistry
new modules are now discovered because of the source analysis:This above command can be run for all the test repositories to show new modules being included in the results.
Improved License Discovery
For a project like minio syft currently has issues discovering any of the licenses when given a
go.mod
file as part of the cataloging processWith this PR we can see this number is down to 29
Similar improvements in license detection and package discovery can be seen for the listed repositories.
Most of the packages that still do not have licenses can be attributed to the packages that are still necessarily discoverable only via the
go.mod
file. A small percentage also have licenses that do not return a result from the classifier.The packages that still need to be discovered via the
go.mod
file can be because of things like build flags, build tags (including os, arch, or more complex header declarations), or other go build settings that are not set for a given environment when the underlying tools rungo list
Potential Breaking Consideration
The go source package type was added here partially to show a difference of the origin for the packages given the new cataloging process. I'm not sure if this constitutes a breaking change or not. If this PR were to merge as is then users WOULD see a different output for identical scan targets between latest syft and the next release. This is because packages previously cataloged as
go-module
would be 'upgraded' togo-source
given this new behavior.I left the metadata as the same between the two and was able to merge the digest information from
go-sum
with thego-source
information. If we want we can just remove thego_source
type and leave these all asgo_modules
that might be the best way forward.I've removed the configuration options for now for
test
andsearchPattern
The
true
andall
respective values for the above are more inline with syft's philosophy on cataloging ALL the things.Feature
Checklist: