Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added detection for untrusted domains in script content - edited existing library and added new query and tests ("the Polyfill PR") #16886

Merged
merged 22 commits into from
Jul 12, 2024

Conversation

aegilops
Copy link
Contributor

@aegilops aegilops commented Jul 1, 2024

Polyfill.io was a popular JavaScript polyflll Content Delivery Network (CDN), used to support new web browser standards on older browsers.

In February 2024 the domain was sold, and in June 2024 it was widely publicised that the domain had been used to serve malicious scripts. It was taken down later in that month, leaving a window where sites that used the service could have been compromised.

Subresource Integrity (SRI) would have prevented the undetected replacement of the Polyfill script code.

This CodeQL query builds on a library, semmle/javascript/security/FunctionalityFromUntrustedSource.qll, which was abstracted from an existing query, javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedSource.ql, to detect specific cases where a script from polyfill.io or cdn.polyfill.io was used without Subresource Integrity, and so the service using it was exposed to compromise.

I've modified that existing library to add polyfill.io and cdn.polyfill.io as domains that require SRI, which makes the existing query detect such uses, and I have added a getUrl() predicate to classes in the library so that I can filter down only to results from polyfill.io in a new query.

I added a specific test for polyfill.io use in its most common context, in an HTML script tag.

This query will work for cases in HTML or in JavaScript code, but will not find cases in any other languages such as PHP or Python where server-side scripting is used to generate client-side HTML or JavaScript.

A code search across github.com hosted sites will find more results manually.

@aegilops aegilops requested a review from a team as a code owner July 1, 2024 15:33
Copy link
Contributor

github-actions bot commented Jul 1, 2024

QHelp previews:

javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp

errors/warnings:

/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:9:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:11:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:15:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:17:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:25:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:34:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:40:4: The element type "p" must be terminated by the matching end-tag "</p>".
A fatal error occurred: 1 qhelp files could not be processed.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

QHelp previews:

javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp

errors/warnings:

/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp: Could not find sample polyfill-cloudflare-check.html
A fatal error occurred: 1 qhelp files could not be processed.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

QHelp previews:

javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedDomain.qhelp

Untrusted domain used in script or other content

Content Delivery Networks (CDNs) are used to deliver content to users quickly and efficiently. However, they can change hands or be operated by untrustworthy owners, risking the security of the sites that use them. Some CDN domains are operated by entities that have used CDNs to deliver malware, which this query identifies.

For example, polyfill.io was a popular JavaScript CDN, used to support new web browser standards on older browsers. In February 2024 the domain was sold, and in June 2024 it was publicised that the domain had been used to serve malicious scripts. It was taken down later in that month, leaving a window where sites that used the service could have been compromised. The same operator runs several other CDNs, undermining trust in those too.

Including a resource from an untrusted source or using an untrusted channel may allow an attacker to include arbitrary code in the response. When including an external resource (for example, a script element) on a page, it is important to ensure that the received data is not malicious.

Even when https is used, an untrustworthy operator might deliver malware.

See the [`CUSTOMIZING.md`](https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md) file in the source code for this query for information on how to extend the list of untrusted domains used by this query.

Recommendation

Carefully research the ownership of a Content Delivery Network (CDN) before using it in your application.

If you find code that originated from an untrusted domain in your application, you should review your logs to check for compromise.

To help mitigate the risk of including a script that could be compromised in the future, consider whether you need to use polyfill or another library at all. Modern browsers do not require a polyfill, and other popular libraries were made redundant by enhancements to HTML 5.

If you do need a polyfill service or library, move to using a CDN that you trust.

When you use a script or link element, you should check for subresource integrity (SRI), and pin to a hash of a version of the service that you can trust (for example, because you have audited it for security and unwanted features). A dynamic service cannot be easily used with SRI. Nevertheless, it is possible to list multiple acceptable SHA hashes in the integrity attribute, such as hashes for the content required for the major browsers used by your users.

You can also choose to self-host an uncompromised version of the service or library.

Example

The following example loads the Polyfill.io library from the polyfill.io CDN. This use was open to malicious scripts being served by the CDN.

<html>
    <head>
        <title>Polyfill.io demo</title>
        <script src="https://cdn.polyfill.io/v2/polyfill.min.js" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

Instead, load the Polyfill library from a trusted CDN, as in the next example:

<html>
    <head>
        <title>Polyfill demo - Cloudflare hosted with pinned version (but no integrity checking, since it is dynamically generated)</title>
        <script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?version=4.8.0" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

If you know which browsers are used by the majority of your users, you can list the hashes of the polyfills for those browsers:

<html>
    <head>
        <title>Polyfill demo - Cloudflare hosted with pinned version (with integrity checking for a *very limited* browser set - just an example!)</title>
        <script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?version=4.8.0" integrity="sha384-i0IGVuZBkKZqwXTD4CH4kcksIbFx7WKFMdxN8zUhLFHpLdELF0ym0jxa6UvLhW8/ sha384-3d4jRKquKl90C9aFG+eH4lPJmtbPHgACWHrp+VomFOxF8lzx2jxqeYkhpRg18UWC" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

References

javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedSource.qhelp

Inclusion of functionality from an untrusted source

Including a resource from an untrusted source or using an untrusted channel may allow an attacker to include arbitrary code in the response. When including an external resource (for example, a script element or an iframe element) on a page, it is important to ensure that the received data is not malicious.

When including external resources, it is possible to verify that the responding server is the intended one by using an https URL. This prevents a MITM (man-in-the-middle) attack where an attacker might have been able to spoof a server response.

Even when https is used, an attacker might still compromise the server. When you use a script element, you can check for subresource integrity - that is, you can check the contents of the data received by supplying a cryptographic digest of the expected sources to the script element. The script will only load sources that match the digest and an attacker will be unable to modify the script even when the server is compromised.

Subresource integrity (SRI) checking is commonly recommended when importing a fixed version of a library - for example, from a CDN (content-delivery network). Then, the fixed digest of that version of the library can easily be added to the script element's integrity attribute.

A dynamic service cannot be easily used with SRI. Nevertheless, it is possible to list multiple acceptable SHA hashes in the integrity attribute, such as those for the content generated for major browers used by your users.

See the [`CUSTOMIZING.md`](https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md) file in the source code for this query for information on how to extend the list of hostnames required to use SRI by this query.

Recommendation

When an iframe element is used to embed a page, it is important to use an https URL.

When using a script element to load a script, it is important to use an https URL and to consider checking subresource integrity.

Example

The following example loads the jQuery library from the jQuery CDN without using https and without checking subresource integrity.

<html>
    <head>
        <title>jQuery demo</title>
        <script src="http://code.jquery.com/jquery-3.6.0.slim.min.js" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

Instead, loading jQuery from the same domain using https and checking subresource integrity is recommended, as in the next example.

<html>
    <head>
        <title>jQuery demo</title>
        <script src="https://code.jquery.com/jquery-3.6.0.slim.min.js" integrity="sha256-u7e5khyithlIdTpu22PHhENmPcRdFiHRjhAuHcs05RI=" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

References

@aegilops aegilops mentioned this pull request Jul 1, 2024
@aegilops
Copy link
Contributor Author

aegilops commented Jul 1, 2024

Because Polyfill.io dynamically generates its content, SRI is apparently impossible.

I'm going to redraft this tomorrow to account for this.

I will refactor the library I made to separate known-bad domains from those requiring SRI.

I also need to remove any references to SRI being a mitigation, and edit the Cloudflare example somehow to make it work - my current example assumes SRI will function correctly.

@erik-krogh
Copy link
Contributor

Hmmm. From reading the updates in the article: https://sansec.io/research/polyfill-supply-chain-attack it sounds like the problem has been fixed.
The domain no longer exists (see status field in https://who.is/whois/polyfill.io), so https://polyfill.io no longer exists (you can safely click the link, it doesn't work).

@aegilops
Copy link
Contributor Author

aegilops commented Jul 8, 2024

That's totally true.

An alert about using this domain isn't going to change future exposure, but rather alert people to the fact that there was prior exposure.

That allows them to start a security response.

aegilops added 4 commits July 8, 2024 10:56
Moved lists of domains to data extensions, including adding those to the overall qlpack.yml

Expanded scope of new query to further domains operated by the untrusted owners of polyfill.io
@aegilops
Copy link
Contributor Author

aegilops commented Jul 9, 2024

I've made a few changes in the interest of getting this merged.

I've extended the query now using data extensions, so it includes more domains operated by the same CDN operator who used polyfill.io maliciously. Those are still active, and present a clear risk to users of them who do not use SRI.

Nevertheless, I moved it to an experimental query. I'd rather it was merged into the main queries for widespread visibility of code that is or was exposed to this risk, but will accept merging into experimental and escalating a move to the main queries later.

It also allows users to extend the query using a data extensions model pack.

I also changed the existing CDN query to allow the extension of the CDN list that requires an SRI more easily in a data extension file. It no longer requires escaping dots in a regex - you just add the domain to the next line in a string in a tuple in the YAML file.

Hope that motivates merging this.

@aegilops aegilops changed the title Added detection for specific Polyfill.io CDN compromise - edited existing library and added new query and tests Added detection for untrusted domains in script content - edited existing library and added new query and tests ("the Polyfill PR") Jul 9, 2024
Comment on lines +6 to +14
# new location of the polyfill.io CDN, which was used to serve malware. See: https://www.cside.dev/blog/the-polyfill-attack-explained
- ["polyfill.com"]
- ["polyfillcache.com"]

# domains operated by the same owner as polyfill.io, which was used to serve malware. See: https://www.cside.dev/blog/the-polyfill-attack-explained
- ["bootcdn.net"]
- ["bootcss.com"]
- ["staticfile.net"]
- ["staticfile.org"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, those are still active.
That removes my argument for keeping the new query in the experimental/ folder, you can move the new query to the default suite.

@aegilops
Copy link
Contributor Author

I've moved the query into the main set of queries, and slightly updated the changes-notes entry. Nothing practical changed, so 🤞 it passes cleanly in the new location

@erik-krogh erik-krogh added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jul 12, 2024
@erik-krogh
Copy link
Contributor

I did some evaluations, and they look good: (nightly, default (internal links)).
There are some TPs that I like.
Including a large webpage that previously used polyfill.io (but they have since changed to use the cloudfare hosted copy).

There is a lot of documentation here, and I would like some eyes from the doc team, but the PR is good to merge after that.
(The doc team gets pinged with the label I just added).
I'm on PTO the next few weeks, but you can ask in #codeql-dynamic for someone else to press the merge button.
Tell them that Erik said it's OK to merge.

felicitymay
felicitymay previously approved these changes Jul 12, 2024
Copy link
Contributor

@felicitymay felicitymay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small docs suggestions, but it's good to have help with lots of context like this 👍🏻

Docs suggestions accepted, thank you 🙏

Co-authored-by: Felicity Chapman <felicitymay@github.com>
@aegilops
Copy link
Contributor Author

@felicitymay thanks so much for the suggestions!

I transferred my note about using SRI with dynamic services into the other query help too. It's the same wording in both places now.

I thought about usability and added a note on CUSTOMIZING.md into a separate file from the query help, and referenced its existence in the query help.

It describes how someone can extend the queries with data extensions, to add their own hostname that needs SRI, or to add their own untrusted domain.

Copy link
Contributor

@felicitymay felicitymay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding information about how to customize the queries 💖

This looks good to me from a docs point of view, but I'd recommend moving the new article into a location that's easier to link to and where it sits alongside related content for discoverability.

I suspect that this addition might require a 👍🏻 from the CodeQL dynamic language team 🤔

javascript/ql/src/Security/CWE-830/CUSTOMIZING.md Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we move this information into the Customizing Library Models for JavaScript — CodeQL article as a practical example. Alternatively, we could create a new article in https://github.com/github/codeql/tree/main/docs/codeql/codeql-language-guides, which could be extended with future examples.

This would require converting the format to RST, but the content is fairly simple so this shouldn't be too much of an overhead and we can provide support if needed.

javascript/ql/src/Security/CWE-830/CUSTOMIZING.md Outdated Show resolved Hide resolved
javascript/ql/src/Security/CWE-830/CUSTOMIZING.md Outdated Show resolved Hide resolved
javascript/ql/src/Security/CWE-830/CUSTOMIZING.md Outdated Show resolved Hide resolved
@aegilops
Copy link
Contributor Author

aegilops commented Jul 12, 2024

Can I propose using the current in-repo markdown approach, and coming back after next week with a second PR to merge it into CodeQL docs, to improve things?

I'll definitely accept your suggestions when I'm back at my IDE, but I don't really want to slow down merging this to get this customisation docs part perfect.

Sound OK, or no go?

@felicitymay
Copy link
Contributor

Can I propose using the current in-repo markdown approach, and coming back after next week with a second PR to merge it into CodeQL docs, to improve things?

Happy to accept this approach. In that case, can we link directly to where the new file will be, that is: https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md instead of just mentioning the file name?

Copy link
Contributor

@felicitymay felicitymay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates 👍🏻

@sidshank sidshank merged commit 772344d into github:main Jul 12, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation JS ready-for-doc-review This PR requires and is ready for review from the GitHub docs team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants