Skip to content

Go: mass-convert taint-flow models to models-as-data format#12562

Closed
smowton wants to merge 73 commits intogithub:mainfrom
smowton:smowton/admin/convert-go-models
Closed

Go: mass-convert taint-flow models to models-as-data format#12562
smowton wants to merge 73 commits intogithub:mainfrom
smowton:smowton/admin/convert-go-models

Conversation

@smowton
Copy link
Contributor

@smowton smowton commented Mar 16, 2023

This converts specifically summaries (not sources or sinks) to Models-as-Data form.

Exceptions:

  • Reverse flow (e.g. a writer wrapper propagates taint from its result to its parameter)
  • Variadic functions, which are not yet correctly handled in Go (but will be when Go: Allow data flow through varargs parameters #11732 is merged)
  • Protobuf models (which are not recognised by simple inheritence) -- all of these kinds of models remain implemented via TaintTracking::FunctionModel.
  • Builtin methods like append and the unsafe package, which work more like macros than functions in that they don't have a single typing in the Go type system (they could have generic type signatures, but don't as of now). For example, append can work like append([]string, ...string) []string, or like append([]int, ...int) []int; we currently extract these with an InvalidType describing the whole function (i.e., Callable.getType() actually doesn't work on these methods, since InvalidType is not a SignatureType).

How this was achieved:

  1. Throwaway query https://gist.github.com/smowton/0a793dad5a64fe431bf0b034613eb797 was used to produce MaD rows from each function bound by a TaintTracking::FunctionModel
  2. That query was run against every database built by a complete run of in codeql test run go/ql/test --keep-databases, and the results were grouped by package
  3. The $ANYVERSION token was manually introduced to enable MaD specification of e.g. my/package/subpackage with the correct splitting into package("my/package", "subpackage"), thus accepting e.g. my/package/v1/subpackage. The default would be package("my/package/subpackage", ""), which suffices in most cases.
  4. All TaintTracking::FunctionModels were deleted (or lost that subtype, if the model also serves some other purpose).

Testing / verification: The method in step 1 would omit some models that didn't have a test, especially ones whose tests currently use stubs that don't define the modelled function. To avoid this I searched for quoted strings deleted from the diff and which didn't occur in go/ql/test/**/*.go. This revealed around 10 missing tests, some of which had resulted in dropped models. Finally I took a by-eye pass over the remaining models to look for possible methods with missing tests but insufficiently distinctive names to be caught by that first pass.

Lots of non-inline-expectation path queries have insignificant test changes due to nodes, edges and subpaths varying depending on how modelling was done.

@github-actions github-actions bot added the Go label Mar 16, 2023
@owen-mc
Copy link
Contributor

owen-mc commented Mar 17, 2023

You seem to have accidentally checked in a binary file: go/ql/test/query-tests/Security/CWE-681/IncorrectIntegerConversion

From looking at this before, I remember a few of the stumbling blocks.

  1. Some models have no tests at all. (From memory, it wasn't that many.)
  2. "Reverse flow" didn't work with MaD, so those models have to be kept in FunctionModel format.
  3. variadic parameters are dealt with differently between FunctionModel and MaD, so this needed a little bit of care to make sure they were converted correctly.

Can you comment on your approach for these three issues?

@smowton smowton force-pushed the smowton/admin/convert-go-models branch from 61a3a77 to fe52389 Compare March 24, 2023 15:55
@smowton smowton marked this pull request as ready for review March 24, 2023 15:56
@smowton smowton requested a review from a team as a code owner March 24, 2023 15:56
@smowton smowton force-pushed the smowton/admin/convert-go-models branch 2 times, most recently from e0b5318 to 80292c2 Compare March 28, 2023 13:58
@smowton smowton force-pushed the smowton/admin/convert-go-models branch from 80292c2 to a3944d0 Compare March 28, 2023 15:10
@smowton smowton force-pushed the smowton/admin/convert-go-models branch from 6acff9b to eb8bb58 Compare March 29, 2023 13:50
@smowton smowton added the no-change-note-required This PR does not need a change note label Mar 29, 2023
smowton added 5 commits March 30, 2023 10:33
This means that when a function has a real body and a summary (usually because it has a real definition in source, and implements an interface that has a model), two callables are created and dispatch considers both possible paths.

This specifically overcomes the difficulty with ParameterNodes when the real callable, if any, may or may not define an SsaNode, either because the real parameter is unused or because it is anonymous. Now the synthetic callable will always have parameter nodes, while the real one may or may not depending on whether a definition is present and
whether or not it names or uses its parameter.
smowton added 4 commits March 30, 2023 10:34
This reverts commit 12eaedc.

We can't do this now, because there is nothing to guarantee an interface has actually been extracted, and therefore whether a model will get applied. Therefore explicitly modelling methods that may be interface implementations where the interface is in a different package may still make a difference to behaviour.
In some cases multiple return value outputs can be coalesced, and in others we had accidentally conflated two independent flows (e.g. Arg1 -> Arg2 | Arg3 -> Arg4 led to accidentally introducing Arg1 -> Arg4 and Arg3 -> Arg2)
@smowton smowton force-pushed the smowton/admin/convert-go-models branch from 064bbc9 to 9901f52 Compare March 30, 2023 09:34
@smowton
Copy link
Contributor Author

smowton commented Mar 30, 2023

@owen-mc all comments addressed

@smowton
Copy link
Contributor Author

smowton commented Mar 30, 2023

(DCA shows quite-bad (2x) performance impact at the moment; looking at resolving that now)

These are cheap and frequently-used, and magicking them with respect to `interpretPackage` was yielding expensive, unnecessary regex operations.
@smowton
Copy link
Contributor Author

smowton commented Mar 31, 2023

Perf fix pushed, and a fresh DCA started

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about what needs fixing, and what (if anything) is blocking it

owen-mc
owen-mc previously approved these changes Mar 31, 2023
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, subject to performance checks and looking into any alerts that are introduced or lost.

This was addressed by adding `getAPackageWithSummarizedCallables`
owen-mc
owen-mc previously approved these changes Mar 31, 2023
Golang's historical approach has been that specifically receiver arguments are not tracked across virtual (interface) dispatch (ordinary arguments are), but models of interface methods do apply both to the receiver and ordinary arguments.

Here I approach that problem by introducing two distinct argument/parameter positions, -1 (ordinary receiver) and -2 (interface receiver). The receiver is passed in ordinary receiver position (-1) if the call is non-virtual, resulting in flow to whatever concrete method is called and perhaps a SummarizedCallable model too, while it is passed in
interface receiver position in the case of a virtual call, perhaps matching the interface receiver of a SummarizedCallable model (it cannot match a concrete method, since the interface method does not have a body).
@smowton
Copy link
Contributor Author

smowton commented Apr 12, 2023

The dataflow-hook variant of this PR was merged via #12750

@smowton smowton closed this Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Go no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments