Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict file types #370

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open

Restrict file types #370

wants to merge 4 commits into from

Conversation

harminius
Copy link
Contributor

@harminius harminius commented Feb 6, 2025

Resolves https://github.com/MerginMaps/server-private/issues/2732 https://github.com/MerginMaps/server-private/issues/2724

Problem:
We allow users to upload any file, even the (potentially) malicious ones 🏴‍☠️

See tickets for more details and the Pentest report

Solution:
We are more confident with using blacklists.

Validate extension during project push - aka push_start. Block blacklisted extensions.
Do not rely only on the extension. Determine the file type from its content and block blacklisted types once we have the file.

Do not allow to sync files with extensions unless whitelisted

E.g. block .exe

Whitelist approach is suggested in this PR as it is considered to be safer. However, using blackilists is also a way, see this comment for suggested blacklists.

Potential whitelist are in the comment

Screenshot from 2025-02-06 08-08-00

Extension renaming of unsupported file doesn't help:

Screenshot from 2025-02-06 08-40-48
Screenshot from 2025-02-06 08-08-59

Zipping the unsupported makes it possible to upload.

@harminius harminius requested a review from MarcelGeo February 6, 2025 08:00
server/mergin/sync/storages/disk.py Outdated Show resolved Hide resolved
@@ -16,6 +16,7 @@
from typing import Optional
from sqlalchemy import text
from pathvalidate import validate_filename, ValidationError
import magic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about standard python mimetypes library? Look at the example from private repo with images.



def validate_path(filepath: str):
if filepath.startswith(("C://", "../", "..\\")) or os.path.isabs(filepath):
Copy link
Contributor

@MarcelGeo MarcelGeo Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could not handle all cases I think. Bonus point: os.path.isabs is not current machine os agnostic. U need to use ntpath module for handling also windows paths, which could be accidentally stored into db. I think then we do not need this hardcoded things. Examples:

Using ntpath on linux could be ok. os.path.isabs is not os agnostic - if you run it on Linux, it's using posixpath. On the other hand, If you run it on windows it's using ntpath.

>>> ntpath.isabs("C:///////fero")
True

See here:
https://stackoverflow.com/questions/3320406/how-to-check-if-a-path-is-absolute-path-or-relative-path-in-a-cross-platform-way/41846670#41846670

validate_filename(filename)


def validate_file(head, filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we do not have one function for validating mime types and extensions?



def supported_type(head) -> bool:
mime_type = magic.Magic(mime=True).from_buffer(head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can use mimetypes.guess_type. If not found, use extensions.

I think we can allow all image mimetypes (image/). We could add new mimetype to library, example for webp. This could be also usefull for guessing o geographic files.

>>> mimetypes.add_type('image/webp', '.webp')
>>> mimetypes.guess_type("hej.webp")
('image/webp', None)

Mimetypes is guessing by extension. magic is guessing mimetype somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found web with geographic mimetypes too...

https://www.digipres.org/formats/mime-types/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we use mimetypes guess for file download, best to be consistent if possible

@coveralls
Copy link

coveralls commented Feb 6, 2025

Pull Request Test Coverage Report for Build 13237918308

Details

  • 35 of 35 (100.0%) changed or added relevant lines in 4 files are covered.
  • 75 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.05%) to 93.279%

Files with Coverage Reduction New Missed Lines %
server/mergin/tests/test_utils.py 1 98.8%
server/mergin/tests/test_project_controller.py 8 98.89%
server/mergin/sync/utils.py 19 83.96%
server/mergin/sync/public_api_controller.py 47 91.86%
Totals Coverage Status
Change from base Build 13199548166: 0.05%
Covered Lines: 6759
Relevant Lines: 7246

💛 - Coveralls

@harminius
Copy link
Contributor Author

harminius commented Feb 7, 2025

Current approach is (in web security recommended) whitelist.
If we are afraid to be too restrictive we could use blackilists.
Blocking logic would stay the same.

Here's the suggestion:

FORBIDDEN_EXTENSIONS = {
    ".ade", ".adp", ".app", ".appcontent-ms", ".application", ".appref-ms",  
    ".asp", ".aspx", ".asx", ".bas", ".bat", ".bgi", ".cab", ".cdxml", ".cer",  
    ".chm", ".cmd", ".cnt", ".com", ".cpl", ".crt", ".csh", ".der", ".diagcab",  
    ".dll", ".drv", ".exe", ".fxp", ".gadget", ".grp", ".hlp", ".hpj", ".hta",  
    ".htc", ".htaccess", ".htpasswd", ".inf", ".ins", ".iso", ".isp",  
    ".its", ".jar", ".jnlp", ".js", ".jse", ".jsp", ".ksh", ".lnk", ".mad",  
    ".maf", ".mag", ".mam", ".maq", ".mar", ".mas", ".mat", ".mau", ".mav",  
    ".maw", ".mcf", ".mda", ".mdb", ".mde", ".mdt", ".mdw", ".mdz", ".msc",  
    ".mht", ".mhtml", ".msh", ".msh1", ".msh2", ".mshxml", ".msh1xml", ".msh2xml",  
    ".msi", ".msp", ".mst", ".msu", ".ops", ".osd", ".pcd", ".pif", ".pl",  
    ".plg", ".prf", ".prg", ".printerexport", ".ps1", ".ps1xml", ".ps2",  
    ".ps2xml", ".psc1", ".psc2", ".psd1", ".psdm1", ".pssc", ".pst", ".py",  
    ".pyc", ".pyo", ".pyw", ".pyz", ".pyzw", ".reg", ".scf", ".scr", ".sct",  
    ".settingcontent-ms", ".sh", ".shb", ".shs", ".sys", ".theme", ".tmp",  
    ".torrent", ".url", ".vb", ".vbe", ".vbp", ".vbs", ".vhd", ".vhdx",  
    ".vsmacros", ".vsw", ".webpnp", ".website", ".ws", ".wsb", ".wsc",  
    ".wsf", ".wsh", ".xbap", ".xll", ".xnk"
}
FORBIDDEN_MIME_TYPES = {
    "application/x-msdownload",
    "application/x-sh",
    "application/x-bat",
    "application/x-msdos-program",
    "application/x-dosexec",
    "application/x-csh",
    "application/x-perl",
    "application/javascript",
    "application/x-python-code",
    "application/x-ruby",
    "application/java-archive",
    "application/vnd.ms-cab-compressed",
    "application/x-ms-shortcut",
    "application/vnd.microsoft.portable-executable",
    "application/x-ms-installer",
    "application/x-ms-application",
    "application/x-ms-wim",
    "text/x-shellscript",
}

@MarcelGeo
Copy link
Contributor

Current approach is (in web security recommended) whitelist. If we are afraid to be too restrictive we could use blackilists. Blocking logic would stay the same.

Here's the suggestion:

FORBIDDEN_EXTENSIONS = {
    ".ade", ".adp", ".app", ".appcontent-ms", ".application", ".appref-ms",  
    ".asp", ".aspx", ".asx", ".bas", ".bat", ".bgi", ".cab", ".cdxml", ".cer",  
    ".chm", ".cmd", ".cnt", ".com", ".cpl", ".crt", ".csh", ".der", ".diagcab",  
    ".dll", ".drv", ".exe", ".fxp", ".gadget", ".grp", ".hlp", ".hpj", ".hta",  
    ".htc", ".htaccess", ".htpasswd", ".img", ".inf", ".ins", ".iso", ".isp",  
    ".its", ".jar", ".jnlp", ".js", ".jse", ".jsp", ".ksh", ".lnk", ".mad",  
    ".maf", ".mag", ".mam", ".maq", ".mar", ".mas", ".mat", ".mau", ".mav",  
    ".maw", ".mcf", ".mda", ".mdb", ".mde", ".mdt", ".mdw", ".mdz", ".msc",  
    ".mht", ".mhtml", ".msh", ".msh1", ".msh2", ".mshxml", ".msh1xml", ".msh2xml",  
    ".msi", ".msp", ".mst", ".msu", ".ops", ".osd", ".pcd", ".pif", ".pl",  
    ".plg", ".prf", ".prg", ".printerexport", ".ps1", ".ps1xml", ".ps2",  
    ".ps2xml", ".psc1", ".psc2", ".psd1", ".psdm1", ".pssc", ".pst", ".py",  
    ".pyc", ".pyo", ".pyw", ".pyz", ".pyzw", ".reg", ".scf", ".scr", ".sct",  
    ".settingcontent-ms", ".sh", ".shb", ".shs", ".sys", ".theme", ".tmp",  
    ".torrent", ".url", ".vb", ".vbe", ".vbp", ".vbs", ".vhd", ".vhdx",  
    ".vsmacros", ".vsw", ".webpnp", ".website", ".ws", ".wsb", ".wsc",  
    ".wsf", ".wsh", ".xbap", ".xll", ".xnk"
}
FORBIDDEN_MIME_TYPES = {
    "application/x-msdownload",
    "application/x-sh",
    "application/x-bat",
    "application/x-msdos-program",
    "application/x-dosexec",
    "application/x-csh",
    "application/x-perl",
    "application/javascript",
    "application/x-python-code",
    "application/x-ruby",
    "application/java-archive",
    "application/vnd.ms-cab-compressed",
    "application/octet-stream",
    "application/x-ms-shortcut",
    "application/vnd.microsoft.portable-executable",
    "application/x-ms-installer",
    "application/x-ms-application",
    "application/x-ms-wim",
    "text/x-shellscript",
}

This looks more sensible for me now.

@harminius harminius changed the base branch from develop to path_validation_in_upload February 7, 2025 14:47
@harminius harminius changed the title Restrict file types and filepaths Restrict file types Feb 10, 2025
@harminius
Copy link
Contributor Author

whitelist:
ALLOWED_EXTENSIONS = {
# Geospatial
# Shapefile components
".shp",
".shx",
".dbf",
".prj",
".cpg",
".qix",
".sbn",
".sbx",
# Vector data formats
".geojson",
".kml",
".kmz",
".gpx",
".dxf",
".svg",
".gpkg",
".json",
# Raster data formats
".tif",
".tiff",
".geotiff",
".asc",
".vrt",
".grd",
".img",
".adf",
# Point cloud data formats
".las",
".laz",
".ply",
".xyz",
".e57",
".pcd",
# Database and container formats
".mbtiles",
".sqlite",
".gpkg",
# Geospatial metadata and styles
".sld",
".qml",
".lyr",
".qgz",
".qgs",
# Other specialized formats
".hdf",
".hdf5",
".netcdf",
".nc",
".dem",
".dt2",
".dt0",
".map",
".tab",
".mif",
".mid",
# Images
".jpg",
".jpeg",
".png",
".bmp",
".gif",
".heic",
".webp",
".tif",
".tiff",
# Text documents
".pdf",
".doc",
".docx",
".odt",
".rtf",
".txt",
# Others
".zip",
}

ALLOWED_MIME_TYPES = {
"application/x-shapefile",
"application/x-dbf",
"text/plain",
"application/octet-stream",
"application/geo+json",
"application/vnd.google-earth.kml+xml",
"application/vnd.google-earth.kmz",
"application/gpx+xml",
"application/geopackage+sqlite3",
"application/vnd.sqlite3",
"application/json",
"text/xml",
"text/html",
"application/vnd.mapbox-vector-tile",
"application/x-sqlite3",
"application/vnd.ogc.sld+xml",
"application/xml",
"application/x-qgis",
"application/x-hdf",
"application/x-hdf5",
"application/x-netcdf",
"application/pdf",
"application/msword",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.oasis.opendocument.text",
"application/rtf",
"text/plain",
"application/zip",
}

def is_supported_extension(filepath) -> bool:
"""Check whether file's extension is supported."""
ext = os.path.splitext(filepath)[1].lower()
return ext not in FORBIDDEN_EXTENSIONS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not for some special files

>>> os.path.splitext(".htpasswd")
('.htpasswd', '')

Base automatically changed from path_validation_in_upload to develop February 10, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants