Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azure] Broken listdir #1431

Open
marcinfair opened this issue Jul 11, 2024 · 9 comments
Open

[azure] Broken listdir #1431

marcinfair opened this issue Jul 11, 2024 · 9 comments

Comments

@marcinfair
Copy link

Since #1403 the logic of listdir was changed and this is not compatible with standard lib os.listdir. Do you plane do leave it as is or fix this behavior?

Issue occurs since v.1.14.4

thanx

@LeaveMyYard
Copy link

LeaveMyYard commented Jul 15, 2024

Not sure if this is 100% caused by this issue, but looks like that.

I am doing

python manage.py collectstatic --noinput --verbosity 2

and after 1.14.4 it started failing. From logs I see that prior to 1.14.4 almost all files are like this:

Skipping 'js/login.js' (not modified)
Skipping 'js/main.js' (not modified)
Skipping 'js/features/sharing.js' (not modified)
Skipping 'js/features/poll_auth.js' (not modified)

But int 1.14.4 they are:

Copying '/app/bonuses/static/js/login.js'
Copying '/app/bonuses/static/js/main.js'
Copying '/app/bonuses/static/js/features/sharing.js'
Copying '/app/bonuses/static/js/features/poll_auth.js'

So I assume the issue is indeed with listing the direction. The main issue is that for me it results in

ValueError: The file 'admin/img/sorting-icons.svg' could not be found with <bonuses.storage.StaticStorage object at 0x7f5809462530>.

My storage is

class CustomS3Boto3Storage(S3Boto3Storage):
    display_location: str

    def url(self, name: str) -> str:
        return f"{self.display_location}/{name}"
 
 
class StaticStorage(ManifestFilesMixin, CustomS3Boto3Storage):
    bucket_name: str = settings.AWS_STORAGE_BUCKET_NAME
    location: str = f"{settings.AWS_LOCATION}/static"
    display_location: str = (
        f"{settings.STATIC_CDN_CUSTOM_DOMAIN}/{settings.DO_SPACES_BUCKET_NAME}/{settings.DO_SPACES_ENV_PREFIX}/static"
    )

Both in 1.13 and 1.14.3 everything works fine

@jschneier
Copy link
Owner

Hi, thanks for the reports.

The comment in reply to my question: #1403 (comment)

@tonybaloney do you have any thoughts here? @Frodothedwarf can you shed further light on why this change had to be made?

For Azure I rely largely on the community.

@Frodothedwarf
Copy link
Contributor

@LeaveMyYard you seem to be using AWS S3 storage is that correct?
@jschneier I haven't touched AWS S3, but only the Azure part, which shouldn't affect how S3 file storage is handled.

The justifications for why this changes were made are that Azure does not support crawling like Django is trying to do when handling files (the part where it goes into one folder, then subfolders and proceeds until no folders is left. Instead Azure returns all files available, resulting in one API call, which is much more efficient than trying to crawl. However I have seen that Azure supports the listing of files with prefix of the folder, like "folder1/". If this needs to be changed I can make a new pull request solving this.

I made the change due to Djangos "--clear" command was broken.

@marcinfair where do you make use of the os.listdir()?

@LeaveMyYard
Copy link

@Frodothedwarf Digital Ocean Spaces to be precise

@Frodothedwarf
Copy link
Contributor

@Frodothedwarf Digital Ocean Spaces to be precise

I haven't touched that part of django-storages, I only made changes to the Azure part.
@jschneier maybe it has something to do with:

S3
  Pull AWS_SESSION_TOKEN from the environment (https://github.com/jschneier/django-storages/pull/1399)
  Fix newline handling for text mode files (https://github.com/jschneier/django-storages/pull/1381)
  Do not sign URLs when querystring_auth=False e.g public buckets or static files (https://github.com/jschneier/django-storages/pull/1402)
  Cache CloudFront Signers (https://github.com/jschneier/django-storages/pull/1417)

@marcinfair
Copy link
Author

@LeaveMyYard you seem to be using AWS S3 storage is that correct? @jschneier I haven't touched AWS S3, but only the Azure part, which shouldn't affect how S3 file storage is handled.

The justifications for why this changes were made are that Azure does not support crawling like Django is trying to do when handling files (the part where it goes into one folder, then subfolders and proceeds until no folders is left. Instead Azure returns all files available, resulting in one API call, which is much more efficient than trying to crawl. However I have seen that Azure supports the listing of files with prefix of the folder, like "folder1/". If this needs to be changed I can make a new pull request solving this.

I made the change due to Djangos "--clear" command was broken.

@marcinfair where do you make use of the os.listdir()?

I just listing files in azure storage in some of my custom functions. Maybe I wasn't super clear. What I meant, was that previous implementation of AzureStorage.listdir() returned names of files in azure storage folder without prepended full path, which was the same behaviour as os.listdir().

I decided to upgrade to v.1.14.4 and make workaround by implementing old behaviour of listdir with custom method on AzureStorage class.

class CustomAzureStorage(AzureStorage):
    def listdir_custom(self, path=""):
        """
        brings back listdir from below v. 1.14.4
        """
        files = []
        dirs = set()
        for name in self.list_all(path):
            n = name[len(path) :]
            if "/" in n:
                dirs.add(n.split("/", 1)[0])
            else:
                files.append(n)
        return list(dirs), files

@Frodothedwarf
Copy link
Contributor

I guess this could be implmeented correctly, if I understand correctly you want the full path?

@marcinfair
Copy link
Author

Previous implementation of listdir() I guess was fine. Method had been returning tuple[list, list]. And relevant part of it was second element:files: list, which was list of all files in specific 'folder' without prepending path. Only filenames. This implementation was in line with mentioned previously os.listdir() which also returns files relative to current working directory (cwd), so api would be more intuitive and easy to consume by newcomers.

@BradWells
Copy link

I just ran into this as well.

The 1.14.3 implementation almost correct, but it had a bug when the file path was not appended (see #1270 )

    def listdir(self, path=""):
        """
        Return directories and files for a given path.
        Leave the path empty to list the root.
        Order of dirs and files is undefined.
        """
        files = []
        dirs = set()
        for name in self.list_all(path):
            n = name[len(path) :]
            if "/" in n:  # <----- THIS RETURNS '' ON EVERY FILE IF PATH DOESN'T HAVE TRAILING SLASH
                dirs.add(n.split("/", 1)[0])
            else:
                files.append(n)
        return list(dirs), files

I assume because of that problem, the implementation was assumed to not work at all, so it was changed to this

    def listdir(self, path=""):
        """
        Return all files for a given path.
        Given that Azure can't return paths it only returns files.
        Works great for our little adventure.
        """

        return [], self.list_all(path)

However, I assume that this could be fixed with something as simple as

    def listdir(self, path=""):
        """
        Return directories and files for a given path.
        Leave the path empty to list the root.
        Order of dirs and files is undefined.
        """
        files = []
        dirs = set()
        cleaned_path = path
        if cleaned_path and not cleaned_path.endswith('/'):
           cleaned_path = path + '/'
        for name in self.list_all(path):
            n = name[len(cleaned_path) :]
            if "/" in n:  # <----- THIS RETURNS '' ON EVERY FILE IF PATH DOESN'T HAVE TRAILING SLASH
                dirs.add(n.split("/", 1)[0])
            else:
                files.append(n)
        return list(dirs), files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants