Go: mass-convert taint-flow models to models-as-data format#12562
Closed
smowton wants to merge 73 commits intogithub:mainfrom
Closed
Go: mass-convert taint-flow models to models-as-data format#12562smowton wants to merge 73 commits intogithub:mainfrom
smowton wants to merge 73 commits intogithub:mainfrom
Conversation
Contributor
|
You seem to have accidentally checked in a binary file: go/ql/test/query-tests/Security/CWE-681/IncorrectIntegerConversion From looking at this before, I remember a few of the stumbling blocks.
Can you comment on your approach for these three issues? |
61a3a77 to
fe52389
Compare
e0b5318 to
80292c2
Compare
80292c2 to
a3944d0
Compare
owen-mc
reviewed
Mar 28, 2023
go/ql/lib/ext/ghproxy-9d2.pages.dev.evanphx.json-patch.model.yml
Outdated
Show resolved
Hide resolved
6acff9b to
eb8bb58
Compare
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
This means that when a function has a real body and a summary (usually because it has a real definition in source, and implements an interface that has a model), two callables are created and dispatch considers both possible paths. This specifically overcomes the difficulty with ParameterNodes when the real callable, if any, may or may not define an SsaNode, either because the real parameter is unused or because it is anonymous. Now the synthetic callable will always have parameter nodes, while the real one may or may not depending on whether a definition is present and whether or not it names or uses its parameter.
This reverts commit 12eaedc. We can't do this now, because there is nothing to guarantee an interface has actually been extracted, and therefore whether a model will get applied. Therefore explicitly modelling methods that may be interface implementations where the interface is in a different package may still make a difference to behaviour.
In some cases multiple return value outputs can be coalesced, and in others we had accidentally conflated two independent flows (e.g. Arg1 -> Arg2 | Arg3 -> Arg4 led to accidentally introducing Arg1 -> Arg4 and Arg3 -> Arg2)
064bbc9 to
9901f52
Compare
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
owen-mc
reviewed
Mar 30, 2023
go/ql/lib/ext/ghproxy-9d2.pages.dev.evanphx.json-patch.model.yml
Outdated
Show resolved
Hide resolved
This referred to a private type
Contributor
Author
|
@owen-mc all comments addressed |
Contributor
Author
|
(DCA shows quite-bad (2x) performance impact at the moment; looking at resolving that now) |
These are cheap and frequently-used, and magicking them with respect to `interpretPackage` was yielding expensive, unnecessary regex operations.
owen-mc
reviewed
Mar 31, 2023
Contributor
Author
|
Perf fix pushed, and a fresh DCA started |
owen-mc
reviewed
Mar 31, 2023
owen-mc
reviewed
Mar 31, 2023
owen-mc
reviewed
Mar 31, 2023
Contributor
There was a problem hiding this comment.
Can you be more specific about what needs fixing, and what (if anything) is blocking it
owen-mc
previously approved these changes
Mar 31, 2023
Contributor
owen-mc
left a comment
There was a problem hiding this comment.
Approved, subject to performance checks and looking into any alerts that are introduced or lost.
This was addressed by adding `getAPackageWithSummarizedCallables`
owen-mc
previously approved these changes
Mar 31, 2023
Golang's historical approach has been that specifically receiver arguments are not tracked across virtual (interface) dispatch (ordinary arguments are), but models of interface methods do apply both to the receiver and ordinary arguments. Here I approach that problem by introducing two distinct argument/parameter positions, -1 (ordinary receiver) and -2 (interface receiver). The receiver is passed in ordinary receiver position (-1) if the call is non-virtual, resulting in flow to whatever concrete method is called and perhaps a SummarizedCallable model too, while it is passed in interface receiver position in the case of a virtual call, perhaps matching the interface receiver of a SummarizedCallable model (it cannot match a concrete method, since the interface method does not have a body).
Contributor
Author
|
The dataflow-hook variant of this PR was merged via #12750 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This converts specifically summaries (not sources or sinks) to Models-as-Data form.
Exceptions:
TaintTracking::FunctionModel.appendand theunsafepackage, which work more like macros than functions in that they don't have a single typing in the Go type system (they could have generic type signatures, but don't as of now). For example,appendcan work likeappend([]string, ...string) []string, or likeappend([]int, ...int) []int; we currently extract these with anInvalidTypedescribing the whole function (i.e.,Callable.getType()actually doesn't work on these methods, sinceInvalidTypeis not aSignatureType).How this was achieved:
TaintTracking::FunctionModelcodeql test run go/ql/test --keep-databases, and the results were grouped by package$ANYVERSIONtoken was manually introduced to enable MaD specification of e.g.my/package/subpackagewith the correct splitting intopackage("my/package", "subpackage"), thus accepting e.g.my/package/v1/subpackage. The default would bepackage("my/package/subpackage", ""), which suffices in most cases.TaintTracking::FunctionModels were deleted (or lost that subtype, if the model also serves some other purpose).Testing / verification: The method in step 1 would omit some models that didn't have a test, especially ones whose tests currently use stubs that don't define the modelled function. To avoid this I searched for quoted strings deleted from the diff and which didn't occur in
go/ql/test/**/*.go. This revealed around 10 missing tests, some of which had resulted in dropped models. Finally I took a by-eye pass over the remaining models to look for possible methods with missing tests but insufficiently distinctive names to be caught by that first pass.Lots of non-inline-expectation path queries have insignificant test changes due to nodes, edges and subpaths varying depending on how modelling was done.