Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ubsan fixes for various tcpdump printers #1012

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

fenner
Copy link
Contributor

@fenner fenner commented Oct 11, 2022

I ran some fuzzing with ubsan enabled. These are my fixes to the problems that this fuzzing found.

(This was kind of an accident; I didn't realize that ubsan was enabled and I got a bunch of surprise fuzz failures, but I figure these fixes are probably helpful anyway)

@infrastation
Copy link
Member

Thank you for preparing these changes. Do you still have the UBSan messages? Does make check pass in your working copy?

@fenner
Copy link
Contributor Author

fenner commented Oct 12, 2022

Thank you for preparing these changes. Do you still have the UBSan messages? Does make check pass in your working copy?

I will recreate the UBSan messages, by removing the patches and rebuilding and using the test pcaps. I didn't keep records when I did this work, only pcaps. I'll put the corresponding messages in with each commit that fixes it?

The out files were generated with 4.99.1, so if there are changes in master that could explain the mismatch. I will regenerate them.

@fenner
Copy link
Contributor Author

fenner commented Oct 12, 2022

P.S. I will also squash the "Address integer overflows" commit into the "ISO:" commit once my discussion with fxlb is complete.

@infrastation
Copy link
Member

Placing the messages into respective commits would be perfect, thank you!

@fenner
Copy link
Contributor Author

fenner commented Oct 13, 2022

The integer overflow commit is still separate, because I want to validate it against a separate implementation of this checksum implementation that I have access to.

I updated the .out files to pass with master in this commit, but I see illumos has already failed so I will investigate the check failure(s) and update.

@infrastation
Copy link
Member

Rebased on the current master and added a fixup, the expectation is that this pull request passes all CI checks.

@fenner
Copy link
Contributor Author

fenner commented Oct 24, 2022

I didn't think it was worth pushing my fix for the illumos build, but I have done so now to avoid getting confused by

Your branch and 'origin/fenner-ubsan-fixes' have diverged,
and have 6 and 14 different commits each, respectively.

What's pending here is extracting the proprietary iso checksum algorithm into a form that I can test with. Once I have validated that our checksum algorithm and tcpdump's have the same result, I will squash the checksum fixes into a single commit, push that and at that point I think it will be ready to merge.

@fenner fenner marked this pull request as draft October 24, 2022 15:23
@infrastation
Copy link
Member

Alright, thank you for the clarifications.

@fenner fenner marked this pull request as ready for review November 27, 2022 18:32
@fenner
Copy link
Contributor Author

fenner commented Nov 27, 2022

I apologize for the delay. In order to double-check that the new checksum algorithm gets the right value, I have processed all of the pcap files that @fxlb provided using this main.c, and both the (proprietary) gated and the updated tcpdump algorithm got the same value for each packet.

#include <pcap/pcap.h>

void callback(u_char *user, const struct pcap_pkthdr *h,
                                                const u_char *bytes) {
   int gated_cksum, tcpdump_cksum;
   gated_cksum = gated_iso_cksum( bytes, h->caplen, bytes );
   tcpdump_cksum = create_osi_cksum( bytes, 0, h->caplen );
   if ( gated_cksum != tcpdump_cksum ) {
      fprintf( stderr, "*** NOT MATCHING *** " );
   }
   fprintf( stderr, "gated %04x tcpdump %04x\n", gated_cksum, tcpdump_cksum );
}

int main(int argc, const char **argv) {
   char errbuf[PCAP_ERRBUF_SIZE];
   pcap_t *p;
   p = pcap_open_offline( argv[ 1 ], errbuf );
   pcap_loop( p, -1, callback, NULL );
}

It's not enough for the *packet* to be able to contain the RD;
the route data also has to be long enough; otherwise, we will
try to shift a negative length left in order to pass it to
bgp_vpn_ip_print()

print-bgp.c:1848:9: runtime error: left shift of negative value -8
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-bgp.c:1848:9 in
…ecksum calculation

The fletcher checksum calculation would sometimes left-shift
a negative number, which is an undefined operation.  Rework the
code to avoid this.

checksum.c:186:20: runtime error: left shift of negative value -36
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior checksum.c:186:20 in

Unlike some checksum routines that use the defined semantics of
2's-complement unsigned overflow to their advantage, this one
gets the wrong value if it is allowed to overflow, due to the
use of mod-255.

Convert c1 to uint64_t to avoid overflow

checksum.c:163:16: runtime error: unsigned integer overflow: NNN + NNN cannot be represented in type 'unsigned int'

Use integers during subtraction to avoid implicit conversion to unsigned
when calculating both x and y

checksum.c:172:18: runtime error: unsigned integer overflow: NNN - NNN cannot be represented in type 'unsigned int'
checksum.c:172:9: runtime error: implicit conversion from type 'unsigned int' of value NNN (32-bit, unsigned) to type 'int' changed the value to -NNN (32-bit, signed)
checksum.c:173:12: runtime error: unsigned integer overflow: NNN - NNN cannot be represented in type 'unsigned int'
checksum.c:173:9: runtime error: implicit conversion from type 'unsigned int' of value NNN (32-bit, unsigned) to type 'int' changed the value to -NNN (32-bit, signed)
When converting an integer from ASN.1, use an unsigned value
for the partial result and assign it to the integer part of
the union at the end, to avoid shifting a negative number left.

print-snmp.c:545:19: runtime error: left shift of negative value -1
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:545:19 in

OID elements are unsigned; a large-enough oid value could result
in the undefined behavior of shifting a signed integer left through
the sign bit, so simply store them as unsigned.

print-snmp.c:769:11: runtime error: left shift of 268435455 by 7 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:769:11 in
Check for truncation before doing pointer arithmetic to point
to the end of the packet.

print-lwres.c:294:10: runtime error: addition of unsigned offset to 0xf3b032be overflowed to 0x9652d560
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:294:10 in
print-lwres.c:549:29: runtime error: addition of unsigned offset to 0xf3b032be overflowed to 0x9652d560
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:549:29 in
Handle ls_length shorter than sizeof(lsa_hdr) in the same way
as OSPF.

Use a u_int32 to hold a loop variable initialized with GET_BE_U_4.

print-ospf6.c:817:46: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-ospf6.c:817:46 in
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Apr 19, 2023
When converting an integer from ASN.1, use an unsigned value
for the partial result and assign it to the integer part of
the union at the end, to avoid shifting a negative number left.

print-snmp.c:545:19: runtime error: left shift of negative value -1
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:545:19

OID elements are unsigned; a large-enough oid value could result
in the undefined behavior of shifting a signed integer left through
the sign bit, so simply store them as unsigned.

print-snmp.c:751:11: runtime error: left shift of 268435455 by 7 places
  cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:751:11

[Part of the PR the-tcpdump-group#1012]
@fxlb
Copy link
Member

fxlb commented Apr 19, 2023

I have commited the fix for SNMP.
I tried to test the fix for lwres (with/without) without success.
Removing the patch, I don't get "addition of unsigned offset to xx be overflowed to yy" messages.
What are OS/compiler/configure options for this one?

@fenner
Copy link
Contributor Author

fenner commented Apr 20, 2023

What are OS/compiler/configure options for this one?

It's likely that some of these are specific to 32-bit builds (e.g., i386_el7).

@fxlb
Copy link
Member

fxlb commented Apr 20, 2023

What are OS/compiler/configure options for this one?

It's likely that some of these are specific to 32-bit builds (e.g., i386_el7).

You are rigth. I need clang, sanitize undefined + address and 32-bit mode.
With it I reproduce the first error in print-lwres.c, line 294.
Was the error on line 549 obtained with the same pcap?

@fenner
Copy link
Contributor Author

fenner commented Apr 26, 2023

Was the error on line 549 obtained with the same pcap?

I'm sorry, I did this work over a year ago and did not expect to have to provide detailed notes. I committed all of my work to the Arista repository, and this is the only lwres pcap that I have in the Arista repository, so my best guess is that it was.

@fxlb
Copy link
Member

fxlb commented Apr 27, 2023

I understand. Can you run the program again with the lwres pcap to check if it displays one or two error messages?

@fenner
Copy link
Contributor Author

fenner commented May 2, 2023

I understand. Can you run the program again with the lwres pcap to check if it displays one or two error messages?

I tried again (with a tcpdump-4.99.2 base) and only get the one:

print-lwres.c:294:10: runtime error: addition of unsigned offset to 0xf40032be overflowed to 0x96a2d560
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:294:10 in 

fxlb pushed a commit to fxlb/tcpdump that referenced this pull request May 2, 2023
Check for truncation before doing pointer arithmetic to point
to the end of the packet.

print-lwres.c:294:10: runtime error: addition of unsigned offset to
  0xf3b032be overflowed to 0x9652d560
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:294:10

[Part of the PR the-tcpdump-group#1012]
@fxlb
Copy link
Member

fxlb commented May 2, 2023

Same for tcpdump-4.99.3, tcpdump-4.99.4, master. Thank you!
I have commited the fix for lwres with, in commit message, the error on line 294.

if (ls_length < sizeof(struct lsa_hdr)) {
ND_PRINT("\n\t Bogus length %u < header (%zu)", ls_length,
sizeof(struct lsa_hdr));
goto trunc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perhaps more an "invalid" length case than a "truncated" case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perhaps more an "invalid" length case than a "truncated" case ?

My thinking is "this TLV doesn't have enough data to contain the header of the TLV, so, the TLV itself is truncated".

All 3 callers of ospf6_print_lshdr() use if (ospf6_print_lshdr(...)) return(1);. From a skim, I don't see a reason for any of them to distinguish between "truncated" and "invalid in this specific way".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notion of truncation may have evolved over the years, but currently a truncation occurs when we try to get len bytes in the packet buffer and we can't.
(checks via GET_*, ND_TTEST_*, ND_TCHECK_*, etc. macros). For ND_TCHECK_* macros, if ND_LONGJMP_FROM_TCHECK is defined for a printer, a longjmp occurs, no more goto trunc (Printers "Modernized" for the 5.0.0 release, remain 34 to do.)
The macros for testing invalid length cases are the ND_ICHECK*, with a goto invalid.
If we add in line 396 the following command, we get the value 470, with the test file ospf-signed-integer-ubsan.pcap:
ND_PRINT("[[%u]]", ND_BYTES_AVAILABLE_AFTER(lshp->ls_length));
Thus there is no truncation here.
Tshark says also the packet is malformed (like our invalid.), not [Packet size limited during capture: XXX truncated]:

[...]
            Link State ID: 145.245.255.254
            Advertising Router: 204.122.189.255
            Sequence Number: 0x02000000
            Checksum: 0x0000
            Length: 0
            [Expert Info (Warning/Protocol): Unknown LSA Type 224]
                [Unknown LSA Type 224]
                [Severity level: Warning]
                [Group: Protocol]
[Malformed Packet: OSPF]
    [Expert Info (Error/Malformed): Malformed Packet (Exception occurred)]
        [Malformed Packet (Exception occurred)]
        [Severity level: Error]
        [Group: Malformed]

Perhaps this change should be updated after a modernization of print-ospf6.c.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. I'm not prepared to make a major rework as a prerequisite for my contribution, so if it's better to have undefined behavior than to print an incorrect truncation message, great, feel free to discard my changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the proposed change makes the code safer, perhaps we could address any cosmetic issues after merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first idea was to hold the change waiting the modernization of print-ospf6.c, but as Denis says we can do the fix as a temporary change. I will commit this part. Thanks for your work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first idea was to hold the change waiting the modernization of print-ospf6.c, but as Denis says we can do the fix as a temporary change. I will commit this part. Thanks for your work.

Thank you. Undefined behavior does not necessarily lead to incorrect behavior, but, if you enable undefined behavior detection in your fuzzing environment, having undefined behavior can cause so much noise that it can prevent you from discovering real problems via fuzzing.

fxlb pushed a commit to fxlb/tcpdump that referenced this pull request May 24, 2023
Handle ls_length shorter than sizeof(lsa_hdr) in the same way as OSPF.

Use a u_int32 to hold a loop variable initialized with GET_BE_U_4.

print-ospf6.c:815:46: runtime error: signed integer overflow:
-2147483648 - 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-ospf6.c:817:46

[Part of the PR the-tcpdump-group#1012]
guyharris added a commit that referenced this pull request Aug 21, 2023
This 1) makes sure that GET_ macros are used to extract data from the
structure (which they already were) and 2) avoids undefined behavior if
the structure isn't aligned on the appropriate memory boundary.

Fixes #1054.  (The SNMP issues are fixed by changes for #1012.)
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Aug 22, 2023
It's not enough for the *packet* to be able to contain the RD;
the route data also has to be long enough; otherwise, we will
try to shift a negative length left in order to pass it to
bgp_vpn_ip_print()

print-bgp.c:1848:9: runtime error: left shift of negative value -8
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-bgp.c:1848:9

[Part of the PR the-tcpdump-group#1012]
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Oct 12, 2023
When converting an integer from ASN.1, use an unsigned value
for the partial result and assign it to the integer part of
the union at the end, to avoid shifting a negative number left.

print-snmp.c:545:19: runtime error: left shift of negative value -1
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:545:19

OID elements are unsigned; a large-enough oid value could result
in the undefined behavior of shifting a signed integer left through
the sign bit, so simply store them as unsigned.

print-snmp.c:751:11: runtime error: left shift of 268435455 by 7 places
  cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:751:11

[Part of the PR the-tcpdump-group#1012]

(cherry picked from commit df5cba8)
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Oct 12, 2023
Check for truncation before doing pointer arithmetic to point
to the end of the packet.

print-lwres.c:294:10: runtime error: addition of unsigned offset to
  0xf3b032be overflowed to 0x9652d560
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:294:10

[Part of the PR the-tcpdump-group#1012]

(cherry picked from commit b016347)
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Oct 12, 2023
Handle ls_length shorter than sizeof(lsa_hdr) in the same way as OSPF.

Use a u_int32 to hold a loop variable initialized with GET_BE_U_4.

print-ospf6.c:815:46: runtime error: signed integer overflow:
-2147483648 - 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-ospf6.c:817:46

[Part of the PR the-tcpdump-group#1012]

(cherry picked from commit 7f3c090)
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Oct 13, 2023
This 1) makes sure that GET_ macros are used to extract data from the
structure (which they already were) and 2) avoids undefined behavior if
the structure isn't aligned on the appropriate memory boundary.

Fixes the-tcpdump-group#1054.  (The SNMP issues are fixed by changes for the-tcpdump-group#1012.)

(cherry picked from commit a0b7859)
fxlb pushed a commit to fxlb/tcpdump that referenced this pull request Oct 13, 2023
It's not enough for the *packet* to be able to contain the RD;
the route data also has to be long enough; otherwise, we will
try to shift a negative length left in order to pass it to
bgp_vpn_ip_print()

print-bgp.c:1848:9: runtime error: left shift of negative value -8
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-bgp.c:1848:9

[Part of the PR the-tcpdump-group#1012]

(cherry picked from commit 5a5646b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants