C++: Constant type-bounds in the new range analysis #13783

MathiasVP · 2023-07-20T15:13:53Z

This PR adds type-based bounds in the new range analysis library. This allows us to deduce that fewer things overflow, and thus allows us exclude fewer bounds.

I'll test this by locally rebasing this branch onto #12505 and running DCA on that.

…pper bound (and similarly for lower bounds).

geoffw0 · 2023-07-20T15:39:15Z

cpp/ql/test/library-tests/ir/range-analysis/SimpleRangeAnalysis_tests.cpp

@@ -936,7 +936,7 @@ void two_bounds_from_one_test(short ss, unsigned short us) {
    range(ss); // -32768 .. 32767
  }

-  if (ss + 1 < sizeof(int)) {  // $ overflow=+
+  if (ss + 1 < sizeof(int)) { // $ overflow=-


👍 verified experimentally

MathiasVP · 2023-07-22T08:16:35Z

I've verified the results and they LGTM. I'm still a bit unsure about whether we should suppress constant bounds arising purely from types such as:

void f(unsigned int ui) {
  unsigned long long ull = ui;
  range(ull); // we infer that `ull <= UINT_MAX` here. Is that what we want?
}

In any case, I think we can push this feature to a future PR. I've noted this down in our internal issue.

geoffw0 · 2023-07-24T13:07:25Z

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

jketema

Some small comments, just to further my understanding of the code.

cpp/ql/lib/semmle/code/cpp/rangeanalysis/new/internal/semantic/SemanticExprSpecific.qll

cpp/ql/test/library-tests/ir/range-analysis/SimpleRangeAnalysis_tests.cpp

MathiasVP · 2023-07-24T13:19:50Z

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.

And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

jketema · 2023-07-24T15:04:14Z

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.

And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

Not as part of this PR, but as each bound comes with a reason, we might want to explore whether we can add a new reason for the type bounds that are introduced here.

jketema · 2023-07-25T10:34:16Z

The DCA alert changes do not make any kind of sense to me. Any clue what was going on there?

MathiasVP · 2023-07-25T11:33:58Z

The cpp/uncontrolled-arithmetic ones in Samate definitely LGTM since they're all in functions annotated as TPs.
- I think the lost result on abseil-cpp-linux is correct assuming MultType resolves to uint128. If it resolves to uint64_t I'm less sure. Will need to check that locally, I think.
Most of the results on cpp/constant-comparison for ImageMagick/ImageMagick also LGTM, but I'm still not sure about all of them.
The cpp/integer-multiplication-cast-to-long results LGTM. We're now able to infer more bounds, so we can infer more bounds for the operands of multiplications.
The cpp/overrunning-write results looks like FP results that comes from inferring upper bounds purely from the type. That's also how the query behaves on main, though, so that's not an issue.
The cpp/unbounded-write results LGTM. All the lost results are because we correctly infer a smaller upper bound based on a type promotion.
Same story for cpp/uncontrolled-allocation-size.
The new result on cpp/very-likely-overrunning-write for lubomyr/bochs is a TP, I think: pname has space for 6 chars, but port%d can have length strlen("port") plus the size of max(i + 1), and i is upper bounded by hub.n_ports (which is an unsigned char). So max(i + 1) = 256. So it does seem like the sprintf requires 4 + 3 + 1 = 8 (for the null terminator) to be safe.
- The new result on vim is absolutely horrible 😂. I can't parse that sprintf in my head, but I'm very sure the new results comes because we're able to infer that key_name[0] and key_name[1] can each be at most CHAR_MAX.

jketema · 2023-07-25T12:05:54Z

Thanks. I would be fine with having this merged, assuming DCA against main for both the nightly suite and the MCTV suite look fine. I'm not sure what @geoffw0 thinks?

MathiasVP · 2023-07-25T12:11:24Z

Good point. I actually didn't test this against main (I only tested it against #12505), but I'll start such a run right away.

MathiasVP · 2023-07-25T14:56:36Z

Uh oh. Since we now infer many many more constant bounds the cpp/constant-array-overflow query suddenly has a lot more results now. There's also two new FPs on cpp/invalid-pointer-deref with an alert message that reads very much like we're getting a type-based bound:

This read might be out of bounds, as the pointer might be equal to call to malloc + position + 65534.

I think that's another good reason to do what Jeroen said here:

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.
And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

Not as part of this PR, but as each bound comes with a reason, we might want to explore whether we can add a new reason for the type bounds that are introduced here.

But since this is causing FPs on cpp/invalid-pointer-deref it may be worth including the necessary changes to SemReason so that we can exclude these cases in the query.

MathiasVP added 2 commits July 20, 2023 16:10

C++: Infer a constant upper bound whenever we convert to a 'larger' u…

c46e3d1

…pper bound (and similarly for lower bounds).

C++: Accept test changes.

9b2d527

github-actions bot added the C++ label Jul 20, 2023

geoffw0 reviewed Jul 20, 2023

View reviewed changes

MathiasVP changed the title ~~C++: Constant type bounds in the new range analysis~~ C++: Constant type-bounds in the new range analysis Jul 21, 2023

MathiasVP marked this pull request as ready for review July 22, 2023 08:13

MathiasVP requested a review from a team as a code owner July 22, 2023 08:13

MathiasVP added the no-change-note-required This PR does not need a change note label Jul 22, 2023

jketema reviewed Jul 24, 2023

View reviewed changes

C++: Constant type-bounds in the new range analysis #13783

C++: Constant type-bounds in the new range analysis #13783

MathiasVP commented Jul 20, 2023 •

edited

geoffw0 Jul 20, 2023

MathiasVP commented Jul 22, 2023 •

edited

geoffw0 commented Jul 24, 2023

jketema left a comment

MathiasVP commented Jul 24, 2023

jketema commented Jul 24, 2023

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023 •

edited

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

C++: Constant type-bounds in the new range analysis #13783

Are you sure you want to change the base?

C++: Constant type-bounds in the new range analysis #13783

Conversation

MathiasVP commented Jul 20, 2023 • edited

geoffw0 Jul 20, 2023

Choose a reason for hiding this comment

MathiasVP commented Jul 22, 2023 • edited

geoffw0 commented Jul 24, 2023

jketema left a comment

Choose a reason for hiding this comment

MathiasVP commented Jul 24, 2023

jketema commented Jul 24, 2023

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023 • edited

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 20, 2023 •

edited

MathiasVP commented Jul 22, 2023 •

edited

MathiasVP commented Jul 25, 2023 •

edited