Skip to content
This repository has been archived by the owner on Oct 3, 2021. It is now read-only.

Fixing benchmarks with overflows from nla-digibench #1155

Merged
merged 14 commits into from
Oct 24, 2020

Conversation

divyeshunadkat
Copy link
Contributor

@divyeshunadkat divyeshunadkat commented Oct 5, 2020

Created the pull request based on the points and suggestions of @MartinSpiessl in the PR #920 and #1105

  • programs added to new and appropriately named directory

  • license present and acceptable (either in separate file or as comment at beginning of program)

  • contributed-by present (either in README file or as comment at beginning of program)

  • programs added to a .set file of an existing category, or new sub-category established (if justified)

  • intended property matches the corresponding .prp file

  • programs and expected answer added to a .yml file according to task definitions

  • data model present in task-definition file
  • original sources present
  • preprocessed files present
  • preprocessed files generated with correct architecture
  • Makefile added with correct content and without overly broad suppression of warnings

divyeshunadkat and others added 2 commits October 5, 2020 10:25
The program has an overflow so the verdict should be true.
@dbeyer
Copy link
Member

dbeyer commented Oct 7, 2020

@divyeshunadkat In the future, we would like to maintain the original source files as .c files,
and use extension .i for the preprocessed files.
If it is not too difficult, could you please "revert" the preprocessing, check-in those as .c files, and rename the current files as .i files? (Just for the files affected by this PR.)

Copy link
Member

@dbeyer dbeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 21cbb7d introduces a problem:
"no-overflow" is the specification, that is, the specification requires that the program does not have an overflow.

Since the program has an overflow, this specification will be violated, which means the expected verdict is false.

Could you please revert this last commit (or correct me if I misunderstood).

@divyeshunadkat
Copy link
Contributor Author

@dbeyer
Thank you for the review. The benchmark hard.c has a overflow. I have reverted the commit you mentioned to set the verdict to false in the hard.yml file.

The original c files are added which do not require pre-processing. Hence, I have reverted the check for pre-processing in the comment. Do let me know if I have understood this correctly and if any other changes are needed.

@hernanponcedeleon
Copy link
Contributor

@divyeshunadkat can you please explain me why the new version of the programs (using unsigned int) cannot have an overflow?

While you now have one extra bit, you are using B = __VERIFIER_nondet_unsigned_int(); which can return the biggest possible number for an unsigned. Then B * p can again overflow if p>1.

I think the same kind of problem is present in egcd2.c and egcd3.c.

Copy link
Contributor

@hernanponcedeleon hernanponcedeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I commented, I don't think the changes guarantee no-overflow

@divyeshunadkat
Copy link
Contributor Author

divyeshunadkat commented Oct 10, 2020

@hernanponcedeleon
Thank you for your review of the pull request and interest in resolving the overflow issue for the benchmarks in the repository. After the issue #1105 raised by @MartinSpiessl for PR #920 , I am trying to see how the concerns can be best resolved. According to me both the benchmarks have an overflow and that is the property I would like to specify for them. I recently understood that this can be achieved by setting the attribute "expected_verdict" to "false" in the yml file under the task definition "no-overflow". I believe you have raised a concern here since the "no-overflow" verdict is still set to "true" in "hardu.yml" file. I will update that to "false" once we confirm that it is the appropriate verdict and correct way of specifying the same.

I would also like to specify the reachability property with an appropriate verdict if possible when the program has only well-defined behaviors. I believe that irrespective of the overflow issue the reachability property can be specified along with an appropriate verified for all the benchmarks, due to the papers: [1], [2] , [3]

However, in order to adhere to the sv-comp rules for benchmarks with user-specified properties which contain overflows, and to be sensitive to the participating tools, I am trying to make these changes as per suggestions. I understand, from the issue #1166 , that you are also working towards correcting the task definitions and verdicts of programs with overflows to adhere to sv-comp rules and your comments there are helpful. Previous year, I too had given an attempt to removing the overflow from several benchmarks in nla-digibench but wasn't able to and am still trying to do so. Please do let me know if there are any issues in my reasoning so that I can make appropriate changes to the benchmarks in this PR.

@hernanponcedeleon
Copy link
Contributor

I believe you have raised a concern here since the "no-overflow" verdict is still set to "true" in "hardu.yml" file. I will update that to "false" once we confirm that it is the appropriate verdict and correct way of specifying the same.

This is only partially true. While you are right that the expected result is wrong for the no-overflow property, I'am also concerned (and thus #1166) that we have benchmarks with overflow participating in the reachability category. I know the CI will complain if it encounters a benchmarks with the no-data-race property set to false and any expected behaviour for reachability. In principle this check should be performed for any property that implies undefined behaviour, e.g. no-overflow, but I'm not sure this is the case. Maybe @dbeyer can comment on this.

I think the main point is that any benchmarks with overflow will result in undefined behaviour and thus we cannot claim any expected result for reachability.

My other concern (again raised in #1166) is what to do with benchmarks that are currently in the reachability category but we discover they have undefined behaviour (overflow in this case). In the ideal world we would keep a copy of the benchmark, use it for the corresponding category (e.g. no-overflow, no-data-race) and solve the problem in the benchmark for reachability, but as you said (and I experience the same during the last week trying to fix the benchmarks I reported in #1166), this is not always easy.

@divyeshunadkat
Copy link
Contributor Author

divyeshunadkat commented Oct 11, 2020

The commit b3fb86c fixes the overflow and hence in commit 7b81d2a I have changed the reachability verdict to true, thus restoring the program complexity such that the assertions in the program hold.

I believe that this fix can be applied to most benchmarks in nla-digibench (#1166 ) as well as other benchmarks with overflow (#1159 ).

@hernanponcedeleon
Copy link
Contributor

The changes look fine to me and I think this PR can be merged.

@dbeyer
Copy link
Member

dbeyer commented Oct 11, 2020

If there is an overflow of an int variable, then this is undefined behavior.

An overflow of an unsigned int variable is not undefined behavior (the value just overflows modulo 2^).

@hernanponcedeleon If you see overflows of int variables in programs that have a property different from no-overflow with expected_verdict: false, then this should be fixed.

Programs with undefined behavior are only good to verify that the program has undefined behavior; other properties do not make sense.

@divyeshunadkat I think that your latest commits are unnecessary because the overflow was for unsigned int which is completely fine.

@hernanponcedeleon
Copy link
Contributor

An overflow of an unsigned int variable is not undefined behavior (the value just overflows modulo 2^).

Thanks for the clarification, I was not aware of this. However this might impact the verification, so we should be cautious:

char x = 280;
assert(x == 24)

would be TRUE for the reachability category. I guess this is the kind of this the BitVector sub-category is dealing with.

@hernanponcedeleon If you see overflows of int variables in programs that have a property different from no-overflow with expected_verdict: false, then this should be fixed.

I'm not sure I fully understand. expected_verdict: false refers to no-overflow of to the property that is different?

@divyeshunadkat
Copy link
Contributor Author

@divyeshunadkat I think that your latest commits are unnecessary because the overflow was for unsigned int which is completely fine.

The latest commits ( b3fb86c and 7b81d2a ) make the assertion to hold with the full complexity of the program and remove all the overflows. Without the change in b3fb86c , the values of variables can still wrap around and violate the property.

This is what was requested in #1105 and something that we have been aspiring to for all the programs with overflows in nla-digibench and other folders as well.

@hernanponcedeleon
Copy link
Contributor

One possibility would be to keep three benchmarks

  • hard -> for no-overflow category
  • hard-u-l (currently called hardu) using unsigned and long long and expected result true
  • hard-u using unsigned by int variables and expected result false

@divyeshunadkat
Copy link
Contributor Author

One possibility would be to keep three benchmarks

* hard -> for no-overflow category

* hard-u-l (currently called hardu) using unsigned and long long and expected result true

* hard-u using unsigned by int variables and expected result false

Sure. Will do.

Can I also make these changes for the other benchmarks in nla-digibench as mentioned in #1166 ?

@hernanponcedeleon
Copy link
Contributor

Sure. It might be that for some benchmarks (I suspect egcd2 and egcd3) there is gonna be a violation even if we use long long so we have to be cautious about having the correct expected result, but at least we won't have undefined behavior if we use unsigned so I'm happy with this solution

@PhilippWendler PhilippWendler added the C Task in language C label Oct 12, 2020
@divyeshunadkat divyeshunadkat changed the title Only overflow check for hard.c and a new benchmark hardu.c with unsigned integers Fixing benchmarks with overflows from nla-digibench Oct 13, 2020
Copy link
Contributor

@hernanponcedeleon hernanponcedeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are still problems with some of the benchmarks.

  • bresenham-ll: there are still overflows in lines 33,39,41,46. Applying the same solution as in line 28 should work.
  • egcd2-ll: it has int y and s*y*y which can overflow
  • geo3: it has int z, int a and a*z*y which can overflow
  • prod4br-ll: int x, int y and x*y which can overflow
  • egcd3-ll: I still suspect there is some problem with this benchmark, but it is hard to explain why. I'm running the verification on the code after using -O3 optimisations. The counter examples I get in the optimised code do not lead to counter examples in the original code. However I suspect this miss match between the result is due to the compiler performing aggressive optimisations due to undefined behaviour. Are you performing any analysis to check there are no overflows or just manually inspecting the benchmark? Also, in the last year, none of the tools managed to obtained a result for this benchmark so it might be just that the expected result is wrong.
  • hard-u: I don't think the expected result should be false. I imagine you are giving this expected result based on the counter example from Corrected verdict of hard.c (from PR #808) and added its correct version #920, but I don't think that is valid now that we use unsigned. If you still think the expected result should be false, can you please provide a counter example?

@divyeshunadkat
Copy link
Contributor Author

@hernanponcedeleon Many thanks for the review.

Implicit type conversion takes care of the overflows in bresenham-ll, egcd2-ll, geo3-ll, and prod4br-ll.

For hard.c, use the initializations A = 3388579422u and B = 1501435237u in place of non-det assignments. In the concrete trace, both assertions in the second loop and the post-condition assert(B == d) are violated.

@hernanponcedeleon
Copy link
Contributor

If by Implicit type conversion you mean that whenever we have an expression where one of the operands is e.g. long long and the other is int the latter is implicitly converted to long long, I think this is not completely true. This is true in the case where the expression is atomic (does not have sub-expression).

The problem are non atomic expressions, e.g. __VERIFIER_assert( 2*Y*x - 2*X*y - X + 2*Y - v == 0);. In this case v is long long but all the other variables are int. The intermediate values will be saved in local variables, and most of those intermediate variables will have type int and thus they can overflow.

Here are some parts of the llvm code generated where the above becomes clear:

call void @assume_abort_if_not(i32 %11)
%12 = load i32, i32* %3, align 4
%13 = sext i32 %12 to i64
%14 = mul nsw i64 2, %13

this corresponds to

assume_abort_if_not(X < 2147483647);
v = ((long long) 2 * Y) - X; 

and the casting is correctly converted to a 64-bits multiplication.

However __VERIFIER_assert( 2*Y*x - 2*X*y - X + 2*Y - v == 0); is converted into

%19 = load i32, i32* %3, align 4
%20 = mul nsw i32 2, %19
%21 = load i32, i32* %4, align 4
%22 = mul nsw i32 %20, %21
%23 = load i32, i32* %2, align 4
%24 = mul nsw i32 2, %23
%25 = load i32, i32* %5, align 4
%26 = mul nsw i32 %24, %25
%27 = sub nsw i32 %22, %26
%28 = load i32, i32* %2, align 4
%29 = sub nsw i32 %27, %28
%30 = load i32, i32* %3, align 4
%31 = mul nsw i32 2, %30
%32 = add nsw i32 %29, %31
%33 = sext i32 %32 to i64
%34 = load i64, i64* %6, align 8
%35 = sub nsw i64 %33, %34
%36 = icmp eq i64 %35, 0
%37 = zext i1 %36 to i32
call void @__VERIFIER_assert(i32 %37)

where you can see the multiplication are only 32-bits and thus the overflow is possible.

I think the above gives problem in bresenham-ll, egcd2-ll, geo3-ll, and prod4br-ll as I mentioned in my previous comment, but also in fermat1-ll and fermat1-ll.

Thanks for the counter example for hard-u, it looks you are right.

Note: there was a small typo (fixed already) in my previous comment: the long comment was about egcd3-ll and not egcd2-ll.

@PhilippWendler
Copy link
Member

The problem are non atomic expressions, e.g. __VERIFIER_assert( 2*Y*x - 2*X*y - X + 2*Y - v == 0);. In this case v is long long but all the other variables are int. The intermediate values will be saved in local variables, and most of those intermediate variables will have type int and thus they can overflow.

For completeness: C does not have atomic vs. non-atomic expressions, and there are no intermediate variables involved here. What LLVM does is just an implementation detail of one particular implementation.

What the C standard says, however, is that for certain operators (including the arithmetic operators) the "usual arithmetic conversions" (such as converting the operands to the "common real type") are performed before the operator is evaluated. This makes it clear that one has to look at each operator individually, deduce the common real type from its operand types according to §6.3.1.8 of the C standard and then check whether it can overflow. Expressions and in particular whether an operator is part of a larger expression are not relevant in this process. (Source: search for "usual arithmetic conversions" in the C standard)

So this does not change the outcome of what you claim, this is just the official reason for it.

@hernanponcedeleon
Copy link
Contributor

Thanks @PhilippWendler for the clarification.

@divyeshunadkat
Copy link
Contributor Author

@hernanponcedeleon @PhilippWendler Many thanks for your detailed, insightful and well articulated comments.

This begs the question that can an llvm pass be implemented that can propagate the type conversions to sub-expressions (if possible in an optimal way)? I believe that such a pass would be useful to many tools that are based on llvm IR, especially when detecting overflows.

I resorted to implicit type conversions to retain the readability of these files.

@divyeshunadkat
Copy link
Contributor Author

@hernanponcedeleon @PhilippWendler @dbeyer

Are there any other changes required before this PR can be merged?

@hernanponcedeleon
Copy link
Contributor

I haven't seen any commits after my last comment, so from my point of view, the benchmarks still have problems.

llvm has a local handling of the implicity casting when generating the
type cast instructions in the IR.
This commit can be reverted if llvm starts looking at the entire
expression as given in a C progam, rather than just the type of
the variables involved in the current expression, before
generating the needed type casts.
@divyeshunadkat
Copy link
Contributor Author

@hernanponcedeleon

Have pushed the changes with casts for each sub-expression that would require one.

@hernanponcedeleon
Copy link
Contributor

I still get problems with 3 benchmarks.

For egcd3-ll, I tried many fixes, but nothing worked. Since in previous years no tool managed to solve this benchmark, I still have some hope that the expected result might be wrong, but this is not something to be tackled in this PR.

I managed to implement solutions for the overflow in egcd-2 and bresenham-ll. You can take a look to my solutions here and here. If you agree with them, please add them to this PR and I'll approve.

For documentation, I explain the problems and solutions here.

bresenham-ll
Problem: multiplications Y*x, X*y and 2 * Y can still overflow
Solution: add a (long long) cast

egcd-2
Problem: you added a cast to avoid overflow of x*y and y*y. However since these are then multiplied by q and s, the result can again overflow
Solution: I added an assumption that x*y and y*y will fit in 32 bits (originally, all variables where ints so this assumption does not modify the initial intended behaviour of the benchmark). The way in which q and s are updated guarantee that q*x*y and s*y*y will not overflow under this assumption.

Can be reverted with same reason as the previous commit.
Copy link
Contributor

@hernanponcedeleon hernanponcedeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks go now.

@dbeyer you can merge already.

Thanks @divyeshunadkat for all the work!

@divyeshunadkat
Copy link
Contributor Author

Many thanks @hernanponcedeleon for your meticulous reviews. Happy that together we resolved the issues in the benchmarks in a way that is useful to the competition. :) 👍

@divyeshunadkat
Copy link
Contributor Author

@dbeyer I have fixed the last change that you had requested about specifying the no-overflow property. The C files do not have any pre-processor directives and hence pre-processed files are not added. Kindly let me know if I must still add the pre-processed files or if there are any other changes required. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C Task in language C
Development

Successfully merging this pull request may close these issues.

4 participants