Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Constant type-bounds in the new range analysis #13783

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jul 20, 2023

This PR adds type-based bounds in the new range analysis library. This allows us to deduce that fewer things overflow, and thus allows us exclude fewer bounds.

I'll test this by locally rebasing this branch onto #12505 and running DCA on that.

@github-actions github-actions bot added the C++ label Jul 20, 2023
@@ -936,7 +936,7 @@ void two_bounds_from_one_test(short ss, unsigned short us) {
range(ss); // -32768 .. 32767
}

if (ss + 1 < sizeof(int)) { // $ overflow=+
if (ss + 1 < sizeof(int)) { // $ overflow=-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 verified experimentally

@MathiasVP MathiasVP changed the title C++: Constant type bounds in the new range analysis C++: Constant type-bounds in the new range analysis Jul 21, 2023
@MathiasVP MathiasVP marked this pull request as ready for review July 22, 2023 08:13
@MathiasVP MathiasVP requested a review from a team as a code owner July 22, 2023 08:13
@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jul 22, 2023

I've verified the results and they LGTM. I'm still a bit unsure about whether we should suppress constant bounds arising purely from types such as:

void f(unsigned int ui) {
  unsigned long long ull = ui;
  range(ull); // we infer that `ull <= UINT_MAX` here. Is that what we want?
}

In any case, I think we can push this feature to a future PR. I've noted this down in our internal issue.

@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jul 22, 2023
@geoffw0
Copy link
Contributor

geoffw0 commented Jul 24, 2023

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, just to further my understanding of the code.

@MathiasVP
Copy link
Contributor Author

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.

And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

@jketema
Copy link
Contributor

jketema commented Jul 24, 2023

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.

And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

Not as part of this PR, but as each bound comes with a reason, we might want to explore whether we can add a new reason for the type bounds that are introduced here.

@jketema
Copy link
Contributor

jketema commented Jul 25, 2023

The DCA alert changes do not make any kind of sense to me. Any clue what was going on there?

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jul 25, 2023

  • The cpp/uncontrolled-arithmetic ones in Samate definitely LGTM since they're all in functions annotated as TPs.
    • I think the lost result on abseil-cpp-linux is correct assuming MultType resolves to uint128. If it resolves to uint64_t I'm less sure. Will need to check that locally, I think.
  • Most of the results on cpp/constant-comparison for ImageMagick/ImageMagick also LGTM, but I'm still not sure about all of them.
  • The cpp/integer-multiplication-cast-to-long results LGTM. We're now able to infer more bounds, so we can infer more bounds for the operands of multiplications.
  • The cpp/overrunning-write results looks like FP results that comes from inferring upper bounds purely from the type. That's also how the query behaves on main, though, so that's not an issue.
  • The cpp/unbounded-write results LGTM. All the lost results are because we correctly infer a smaller upper bound based on a type promotion.
  • Same story for cpp/uncontrolled-allocation-size.
  • The new result on cpp/very-likely-overrunning-write for lubomyr/bochs is a TP, I think: pname has space for 6 chars, but port%d can have length strlen("port") plus the size of max(i + 1), and i is upper bounded by hub.n_ports (which is an unsigned char). So max(i + 1) = 256. So it does seem like the sprintf requires 4 + 3 + 1 = 8 (for the null terminator) to be safe.
    • The new result on vim is absolutely horrible 😂. I can't parse that sprintf in my head, but I'm very sure the new results comes because we're able to infer that key_name[0] and key_name[1] can each be at most CHAR_MAX.

@jketema
Copy link
Contributor

jketema commented Jul 25, 2023

Thanks. I would be fine with having this merged, assuming DCA against main for both the nightly suite and the MCTV suite look fine. I'm not sure what @geoffw0 thinks?

@MathiasVP
Copy link
Contributor Author

Good point. I actually didn't test this against main (I only tested it against #12505), but I'll start such a run right away.

@MathiasVP
Copy link
Contributor Author

Uh oh. Since we now infer many many more constant bounds the cpp/constant-array-overflow query suddenly has a lot more results now. There's also two new FPs on cpp/invalid-pointer-deref with an alert message that reads very much like we're getting a type-based bound:

This read might be out of bounds, as the pointer might be equal to call to malloc + position + 65534.

I think that's another good reason to do what Jeroen said here:

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.
And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

Not as part of this PR, but as each bound comes with a reason, we might want to explore whether we can add a new reason for the type bounds that are introduced here.

But since this is causing FPs on cpp/invalid-pointer-deref it may be worth including the necessary changes to SemReason so that we can exclude these cases in the query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants