Fixing benchmarks with overflows from nla-digibench #1155

divyeshunadkat · 2020-10-05T05:00:55Z

Created the pull request based on the points and suggestions of @MartinSpiessl in the PR #920 and #1105

programs added to new and appropriately named directory
license present and acceptable (either in separate file or as comment at beginning of program)
contributed-by present (either in README file or as comment at beginning of program)
programs added to a .set file of an existing category, or new sub-category established (if justified)
intended property matches the corresponding .prp file
programs and expected answer added to a .yml file according to task definitions

data model present in task-definition file
original sources present
preprocessed files present
preprocessed files generated with correct architecture
Makefile added with correct content and without overly broad suppression of warnings

…gers

The program has an overflow so the verdict should be true.

dbeyer · 2020-10-07T07:50:53Z

@divyeshunadkat In the future, we would like to maintain the original source files as .c files,
and use extension .i for the preprocessed files.
If it is not too difficult, could you please "revert" the preprocessing, check-in those as .c files, and rename the current files as .i files? (Just for the files affected by this PR.)

dbeyer

Commit 21cbb7d introduces a problem:
"no-overflow" is the specification, that is, the specification requires that the program does not have an overflow.

Since the program has an overflow, this specification will be violated, which means the expected verdict is false.

Could you please revert this last commit (or correct me if I misunderstood).

This reverts commit 21cbb7d.

divyeshunadkat · 2020-10-07T13:00:42Z

@dbeyer
Thank you for the review. The benchmark hard.c has a overflow. I have reverted the commit you mentioned to set the verdict to false in the hard.yml file.

The original c files are added which do not require pre-processing. Hence, I have reverted the check for pre-processing in the comment. Do let me know if I have understood this correctly and if any other changes are needed.

hernanponcedeleon · 2020-10-09T09:47:07Z

@divyeshunadkat can you please explain me why the new version of the programs (using unsigned int) cannot have an overflow?

While you now have one extra bit, you are using B = __VERIFIER_nondet_unsigned_int(); which can return the biggest possible number for an unsigned. Then B * p can again overflow if p>1.

I think the same kind of problem is present in egcd2.c and egcd3.c.

hernanponcedeleon

as I commented, I don't think the changes guarantee no-overflow

divyeshunadkat · 2020-10-10T09:25:31Z

@hernanponcedeleon
Thank you for your review of the pull request and interest in resolving the overflow issue for the benchmarks in the repository. After the issue #1105 raised by @MartinSpiessl for PR #920 , I am trying to see how the concerns can be best resolved. According to me both the benchmarks have an overflow and that is the property I would like to specify for them. I recently understood that this can be achieved by setting the attribute "expected_verdict" to "false" in the yml file under the task definition "no-overflow". I believe you have raised a concern here since the "no-overflow" verdict is still set to "true" in "hardu.yml" file. I will update that to "false" once we confirm that it is the appropriate verdict and correct way of specifying the same.

I would also like to specify the reachability property with an appropriate verdict if possible when the program has only well-defined behaviors. I believe that irrespective of the overflow issue the reachability property can be specified along with an appropriate verified for all the benchmarks, due to the papers: [1], [2] , [3]

However, in order to adhere to the sv-comp rules for benchmarks with user-specified properties which contain overflows, and to be sensitive to the participating tools, I am trying to make these changes as per suggestions. I understand, from the issue #1166 , that you are also working towards correcting the task definitions and verdicts of programs with overflows to adhere to sv-comp rules and your comments there are helpful. Previous year, I too had given an attempt to removing the overflow from several benchmarks in nla-digibench but wasn't able to and am still trying to do so. Please do let me know if there are any issues in my reasoning so that I can make appropriate changes to the benchmarks in this PR.

hernanponcedeleon · 2020-10-10T09:44:46Z

I believe you have raised a concern here since the "no-overflow" verdict is still set to "true" in "hardu.yml" file. I will update that to "false" once we confirm that it is the appropriate verdict and correct way of specifying the same.

This is only partially true. While you are right that the expected result is wrong for the no-overflow property, I'am also concerned (and thus #1166) that we have benchmarks with overflow participating in the reachability category. I know the CI will complain if it encounters a benchmarks with the no-data-race property set to false and any expected behaviour for reachability. In principle this check should be performed for any property that implies undefined behaviour, e.g. no-overflow, but I'm not sure this is the case. Maybe @dbeyer can comment on this.

I think the main point is that any benchmarks with overflow will result in undefined behaviour and thus we cannot claim any expected result for reachability.

My other concern (again raised in #1166) is what to do with benchmarks that are currently in the reachability category but we discover they have undefined behaviour (overflow in this case). In the ideal world we would keep a copy of the benchmark, use it for the corresponding category (e.g. no-overflow, no-data-race) and solve the problem in the benchmark for reachability, but as you said (and I experience the same during the last week trying to fix the benchmarks I reported in #1166), this is not always easy.

divyeshunadkat · 2020-10-11T05:24:12Z

The commit b3fb86c fixes the overflow and hence in commit 7b81d2a I have changed the reachability verdict to true, thus restoring the program complexity such that the assertions in the program hold.

I believe that this fix can be applied to most benchmarks in nla-digibench (#1166 ) as well as other benchmarks with overflow (#1159 ).

hernanponcedeleon · 2020-10-11T10:04:02Z

The changes look fine to me and I think this PR can be merged.

dbeyer · 2020-10-11T10:54:40Z

If there is an overflow of an int variable, then this is undefined behavior.

An overflow of an unsigned int variable is not undefined behavior (the value just overflows modulo 2^).

@hernanponcedeleon If you see overflows of int variables in programs that have a property different from no-overflow with expected_verdict: false, then this should be fixed.

Programs with undefined behavior are only good to verify that the program has undefined behavior; other properties do not make sense.

@divyeshunadkat I think that your latest commits are unnecessary because the overflow was for unsigned int which is completely fine.

hernanponcedeleon · 2020-10-11T11:30:44Z

An overflow of an unsigned int variable is not undefined behavior (the value just overflows modulo 2^).

Thanks for the clarification, I was not aware of this. However this might impact the verification, so we should be cautious:

char x = 280;
assert(x == 24)

would be TRUE for the reachability category. I guess this is the kind of this the BitVector sub-category is dealing with.

@hernanponcedeleon If you see overflows of int variables in programs that have a property different from no-overflow with expected_verdict: false, then this should be fixed.

I'm not sure I fully understand. expected_verdict: false refers to no-overflow of to the property that is different?

divyeshunadkat · 2020-10-11T11:49:14Z

@divyeshunadkat I think that your latest commits are unnecessary because the overflow was for unsigned int which is completely fine.

The latest commits ( b3fb86c and 7b81d2a ) make the assertion to hold with the full complexity of the program and remove all the overflows. Without the change in b3fb86c , the values of variables can still wrap around and violate the property.

This is what was requested in #1105 and something that we have been aspiring to for all the programs with overflows in nla-digibench and other folders as well.

hernanponcedeleon · 2020-10-11T11:54:18Z

One possibility would be to keep three benchmarks

hard -> for no-overflow category
hard-u-l (currently called hardu) using unsigned and long long and expected result true
hard-u using unsigned by int variables and expected result false

divyeshunadkat · 2020-10-11T12:04:13Z

One possibility would be to keep three benchmarks

* hard -> for no-overflow category

* hard-u-l (currently called hardu) using unsigned and long long and expected result true

* hard-u using unsigned by int variables and expected result false

Sure. Will do.

Can I also make these changes for the other benchmarks in nla-digibench as mentioned in #1166 ?

hernanponcedeleon · 2020-10-11T12:08:52Z

Sure. It might be that for some benchmarks (I suspect egcd2 and egcd3) there is gonna be a violation even if we use long long so we have to be cautious about having the correct expected result, but at least we won't have undefined behavior if we use unsigned so I'm happy with this solution

Modifications in lcm2.c and hard-u.c

…nreach property

hernanponcedeleon

I think there are still problems with some of the benchmarks.

bresenham-ll: there are still overflows in lines 33,39,41,46. Applying the same solution as in line 28 should work.
egcd2-ll: it has int y and s*y*y which can overflow
geo3: it has int z, int a and a*z*y which can overflow
prod4br-ll: int x, int y and x*y which can overflow
egcd3-ll: I still suspect there is some problem with this benchmark, but it is hard to explain why. I'm running the verification on the code after using -O3 optimisations. The counter examples I get in the optimised code do not lead to counter examples in the original code. However I suspect this miss match between the result is due to the compiler performing aggressive optimisations due to undefined behaviour. Are you performing any analysis to check there are no overflows or just manually inspecting the benchmark? Also, in the last year, none of the tools managed to obtained a result for this benchmark so it might be just that the expected result is wrong.
hard-u: I don't think the expected result should be false. I imagine you are giving this expected result based on the counter example from Corrected verdict of hard.c (from PR #808) and added its correct version #920, but I don't think that is valid now that we use unsigned. If you still think the expected result should be false, can you please provide a counter example?

divyeshunadkat · 2020-10-14T17:11:58Z

@hernanponcedeleon Many thanks for the review.

Implicit type conversion takes care of the overflows in bresenham-ll, egcd2-ll, geo3-ll, and prod4br-ll.

For hard.c, use the initializations A = 3388579422u and B = 1501435237u in place of non-det assignments. In the concrete trace, both assertions in the second loop and the post-condition assert(B == d) are violated.

hernanponcedeleon · 2020-10-15T06:55:05Z

If by Implicit type conversion you mean that whenever we have an expression where one of the operands is e.g. long long and the other is int the latter is implicitly converted to long long, I think this is not completely true. This is true in the case where the expression is atomic (does not have sub-expression).

The problem are non atomic expressions, e.g. __VERIFIER_assert( 2*Y*x - 2*X*y - X + 2*Y - v == 0);. In this case v is long long but all the other variables are int. The intermediate values will be saved in local variables, and most of those intermediate variables will have type int and thus they can overflow.

Here are some parts of the llvm code generated where the above becomes clear:

call void @assume_abort_if_not(i32 %11)
%12 = load i32, i32* %3, align 4
%13 = sext i32 %12 to i64
%14 = mul nsw i64 2, %13

this corresponds to

assume_abort_if_not(X < 2147483647);
v = ((long long) 2 * Y) - X;

and the casting is correctly converted to a 64-bits multiplication.

However __VERIFIER_assert( 2*Y*x - 2*X*y - X + 2*Y - v == 0); is converted into

%19 = load i32, i32* %3, align 4
%20 = mul nsw i32 2, %19
%21 = load i32, i32* %4, align 4
%22 = mul nsw i32 %20, %21
%23 = load i32, i32* %2, align 4
%24 = mul nsw i32 2, %23
%25 = load i32, i32* %5, align 4
%26 = mul nsw i32 %24, %25
%27 = sub nsw i32 %22, %26
%28 = load i32, i32* %2, align 4
%29 = sub nsw i32 %27, %28
%30 = load i32, i32* %3, align 4
%31 = mul nsw i32 2, %30
%32 = add nsw i32 %29, %31
%33 = sext i32 %32 to i64
%34 = load i64, i64* %6, align 8
%35 = sub nsw i64 %33, %34
%36 = icmp eq i64 %35, 0
%37 = zext i1 %36 to i32
call void @__VERIFIER_assert(i32 %37)

where you can see the multiplication are only 32-bits and thus the overflow is possible.

I think the above gives problem in bresenham-ll, egcd2-ll, geo3-ll, and prod4br-ll as I mentioned in my previous comment, but also in fermat1-ll and fermat1-ll.

Thanks for the counter example for hard-u, it looks you are right.

Note: there was a small typo (fixed already) in my previous comment: the long comment was about egcd3-ll and not egcd2-ll.

PhilippWendler · 2020-10-15T07:08:49Z

The problem are non atomic expressions, e.g. __VERIFIER_assert( 2*Y*x - 2*X*y - X + 2*Y - v == 0);. In this case v is long long but all the other variables are int. The intermediate values will be saved in local variables, and most of those intermediate variables will have type int and thus they can overflow.

For completeness: C does not have atomic vs. non-atomic expressions, and there are no intermediate variables involved here. What LLVM does is just an implementation detail of one particular implementation.

What the C standard says, however, is that for certain operators (including the arithmetic operators) the "usual arithmetic conversions" (such as converting the operands to the "common real type") are performed before the operator is evaluated. This makes it clear that one has to look at each operator individually, deduce the common real type from its operand types according to §6.3.1.8 of the C standard and then check whether it can overflow. Expressions and in particular whether an operator is part of a larger expression are not relevant in this process. (Source: search for "usual arithmetic conversions" in the C standard)

So this does not change the outcome of what you claim, this is just the official reason for it.

hernanponcedeleon · 2020-10-15T07:18:33Z

Thanks @PhilippWendler for the clarification.

divyeshunadkat · 2020-10-15T09:07:18Z

@hernanponcedeleon @PhilippWendler Many thanks for your detailed, insightful and well articulated comments.

This begs the question that can an llvm pass be implemented that can propagate the type conversions to sub-expressions (if possible in an optimal way)? I believe that such a pass would be useful to many tools that are based on llvm IR, especially when detecting overflows.

I resorted to implicit type conversions to retain the readability of these files.

divyeshunadkat · 2020-10-17T10:29:53Z

@hernanponcedeleon @PhilippWendler @dbeyer

Are there any other changes required before this PR can be merged?

hernanponcedeleon · 2020-10-17T10:31:36Z

I haven't seen any commits after my last comment, so from my point of view, the benchmarks still have problems.

llvm has a local handling of the implicity casting when generating the type cast instructions in the IR. This commit can be reverted if llvm starts looking at the entire expression as given in a C progam, rather than just the type of the variables involved in the current expression, before generating the needed type casts.

divyeshunadkat · 2020-10-20T06:26:26Z

@hernanponcedeleon

Have pushed the changes with casts for each sub-expression that would require one.

hernanponcedeleon · 2020-10-20T09:39:03Z

I still get problems with 3 benchmarks.

For egcd3-ll, I tried many fixes, but nothing worked. Since in previous years no tool managed to solve this benchmark, I still have some hope that the expected result might be wrong, but this is not something to be tackled in this PR.

I managed to implement solutions for the overflow in egcd-2 and bresenham-ll. You can take a look to my solutions here and here. If you agree with them, please add them to this PR and I'll approve.

For documentation, I explain the problems and solutions here.

bresenham-ll
Problem: multiplications Y*x, X*y and 2 * Y can still overflow
Solution: add a (long long) cast

egcd-2
Problem: you added a cast to avoid overflow of x*y and y*y. However since these are then multiplied by q and s, the result can again overflow
Solution: I added an assumption that x*y and y*y will fit in 32 bits (originally, all variables where ints so this assumption does not modify the initial intended behaviour of the benchmark). The way in which q and s are updated guarantee that q*x*y and s*y*y will not overflow under this assumption.

Can be reverted with same reason as the previous commit.

hernanponcedeleon

Looks go now.

@dbeyer you can merge already.

Thanks @divyeshunadkat for all the work!

divyeshunadkat · 2020-10-22T05:00:12Z

Many thanks @hernanponcedeleon for your meticulous reviews. Happy that together we resolved the issues in the benchmarks in a way that is useful to the competition. :) 👍

divyeshunadkat · 2020-10-22T05:01:59Z

@dbeyer I have fixed the last change that you had requested about specifying the no-overflow property. The C files do not have any pre-processor directives and hence pre-processed files are not added. Kindly let me know if I must still add the pre-processed files or if there are any other changes required. Thanks.

divyeshunadkat and others added 2 commits October 5, 2020 10:25

Only overflow check for hard.c and a new benchmark with unsigned inte…

b10e87c

…gers

Update hard.yml

21cbb7d

The program has an overflow so the verdict should be true.

dbeyer approved these changes Oct 7, 2020

View reviewed changes

dbeyer suggested changes Oct 7, 2020

View reviewed changes

divyeshunadkat added 3 commits October 7, 2020 15:49

Revert "Update hard.yml"

fbebf3c

This reverts commit 21cbb7d.

Merge branch 'master' of https://github.com/sosy-lab/sv-benchmarks

88c0a7d

corrected the file name in hardu.c

d0f58e6

hernanponcedeleon mentioned this pull request Oct 9, 2020

Reachability benchmarks with overflow #1166

Open

hernanponcedeleon self-requested a review October 9, 2020 14:48

hernanponcedeleon suggested changes Oct 9, 2020

View reviewed changes

divyeshunadkat added 2 commits October 11, 2020 09:46

Attempt to fix the overflow

b3fb86c

Resolving overflow issue makes the reachability property true

7b81d2a

divyeshunadkat mentioned this pull request Oct 11, 2020

PR #920 needs some fixes (undefined behavior was overlooked, fixed task simplified the program too much) #1105

Open

divyeshunadkat requested review from dbeyer and hernanponcedeleon October 11, 2020 05:21

hernanponcedeleon previously approved these changes Oct 11, 2020

View reviewed changes

PhilippWendler added the C Task in language C label Oct 12, 2020

hernanponcedeleon mentioned this pull request Oct 12, 2020

Undefined behavior overlooked in a couple of array programs in PR #835 #1173

Closed

divyeshunadkat added 3 commits October 14, 2020 02:09

Renamed

ba206d3

Changed the property from unreach to no-overflow

bcd2451

Modifications in lcm2.c and hard-u.c

Benchmarks without overflows while retaining program complexity and u…

a36f931

…nreach property

divyeshunadkat dismissed hernanponcedeleon’s stale review via a36f931 October 13, 2020 20:45

divyeshunadkat requested a review from hernanponcedeleon October 13, 2020 20:47

divyeshunadkat changed the title ~~Only overflow check for hard.c and a new benchmark hardu.c with unsigned integers~~ Fixing benchmarks with overflows from nla-digibench Oct 13, 2020

Corrections

af1524f

hernanponcedeleon suggested changes Oct 14, 2020

View reviewed changes

divyeshunadkat requested a review from hernanponcedeleon October 14, 2020 17:13

divyeshunadkat added 2 commits October 20, 2020 10:16

Missed adding the type cast to avoid the overflow

ada9e41

Added a few more type casts to avoid overflow in llvm IR

9166aa3

Can be reverted with same reason as the previous commit.

hernanponcedeleon approved these changes Oct 21, 2020

View reviewed changes

dbeyer approved these changes Oct 24, 2020

View reviewed changes

dbeyer merged commit b7fe5c3 into sosy-lab:master Oct 24, 2020

This was referenced Oct 30, 2020

Broken generation script #1198

Closed

Update nla-digbench-scaling, account for recently found overflows in nla-digbench #1200

Closed

hernanponcedeleon mentioned this pull request Oct 30, 2020

Added missing expected result for no-overflow property #1204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing benchmarks with overflows from nla-digibench #1155

Fixing benchmarks with overflows from nla-digibench #1155

divyeshunadkat commented Oct 5, 2020 •

edited

Loading

dbeyer commented Oct 7, 2020

dbeyer left a comment

divyeshunadkat commented Oct 7, 2020

hernanponcedeleon commented Oct 9, 2020

hernanponcedeleon left a comment

divyeshunadkat commented Oct 10, 2020 •

edited

Loading

hernanponcedeleon commented Oct 10, 2020

divyeshunadkat commented Oct 11, 2020 •

edited

Loading

hernanponcedeleon commented Oct 11, 2020

dbeyer commented Oct 11, 2020

hernanponcedeleon commented Oct 11, 2020

divyeshunadkat commented Oct 11, 2020

hernanponcedeleon commented Oct 11, 2020

divyeshunadkat commented Oct 11, 2020

hernanponcedeleon commented Oct 11, 2020

hernanponcedeleon left a comment •

edited

Loading

divyeshunadkat commented Oct 14, 2020

hernanponcedeleon commented Oct 15, 2020

PhilippWendler commented Oct 15, 2020

hernanponcedeleon commented Oct 15, 2020

divyeshunadkat commented Oct 15, 2020

divyeshunadkat commented Oct 17, 2020

hernanponcedeleon commented Oct 17, 2020

divyeshunadkat commented Oct 20, 2020

hernanponcedeleon commented Oct 20, 2020

hernanponcedeleon left a comment

divyeshunadkat commented Oct 22, 2020

divyeshunadkat commented Oct 22, 2020

Fixing benchmarks with overflows from nla-digibench #1155

Fixing benchmarks with overflows from nla-digibench #1155

Conversation

divyeshunadkat commented Oct 5, 2020 • edited Loading

dbeyer commented Oct 7, 2020

dbeyer left a comment

Choose a reason for hiding this comment

divyeshunadkat commented Oct 7, 2020

hernanponcedeleon commented Oct 9, 2020

hernanponcedeleon left a comment

Choose a reason for hiding this comment

divyeshunadkat commented Oct 10, 2020 • edited Loading

hernanponcedeleon commented Oct 10, 2020

divyeshunadkat commented Oct 11, 2020 • edited Loading

hernanponcedeleon commented Oct 11, 2020

dbeyer commented Oct 11, 2020

hernanponcedeleon commented Oct 11, 2020

divyeshunadkat commented Oct 11, 2020

hernanponcedeleon commented Oct 11, 2020

divyeshunadkat commented Oct 11, 2020

hernanponcedeleon commented Oct 11, 2020

hernanponcedeleon left a comment • edited Loading

Choose a reason for hiding this comment

divyeshunadkat commented Oct 14, 2020

hernanponcedeleon commented Oct 15, 2020

PhilippWendler commented Oct 15, 2020

hernanponcedeleon commented Oct 15, 2020

divyeshunadkat commented Oct 15, 2020

divyeshunadkat commented Oct 17, 2020

hernanponcedeleon commented Oct 17, 2020

divyeshunadkat commented Oct 20, 2020

hernanponcedeleon commented Oct 20, 2020

hernanponcedeleon left a comment

Choose a reason for hiding this comment

divyeshunadkat commented Oct 22, 2020

divyeshunadkat commented Oct 22, 2020

divyeshunadkat commented Oct 5, 2020 •

edited

Loading

divyeshunadkat commented Oct 10, 2020 •

edited

Loading

divyeshunadkat commented Oct 11, 2020 •

edited

Loading

hernanponcedeleon left a comment •

edited

Loading