Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support large files #582

Merged
merged 78 commits into from
Nov 14, 2024
Merged

Support large files #582

merged 78 commits into from
Nov 14, 2024

Conversation

Ceredron
Copy link
Collaborator

@Ceredron Ceredron commented Nov 8, 2024

Description

Added support of very large files when virus scan is disabled.

Related Issue(s)

Verification

  • Your code builds clean without any errors or warnings
  • Manual testing done (required)
  • Relevant automated test added (if you find this hard, leave it and we'll help out)
  • All tests run green

Documentation

  • User documentation is updated with a separate linked PR in altinn-studio-docs. (if applicable)

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new environment variables for Azure Storage configuration: BlockSize, ConcurrentUploadThreads, and BlocksBeforeCommit.
    • Added a new project for testing large file uploads.
    • Implemented a console application for testing file uploads to the Altinn Broker API.
  • Enhancements

    • Improved error handling and request validation in file transfer processes.
    • Expanded the ServiceOwnerOverviewExt class to manage multiple storage providers.
    • Added support for disabling virus scanning during file transfers.
    • Enhanced the AzureStorageOptions class for better configuration management.
    • Updated service registrations in dependency injection for improved clarity and functionality.
  • Bug Fixes

    • Resolved issues with file transfer size validations and error handling for uploads.
  • Documentation

    • Updated documentation to include instructions for testing large file uploads.
  • Chores

    • Cleaned up unused code and removed specific file references from project files.

…ately. Also remove logic for calculating block size for now.
…g. Changed to read into our own buffer until it's full. Will try to reduce # buffers in next iteration.
…ting BlockBlobClient uri directly as it did not work with Azurite.
@Ceredron Ceredron added the kind/feature A distinct functionality or enhancement added to a product to provide specific value to users label Nov 13, 2024
@Ceredron Ceredron changed the title Feat/big file batching Support large files Nov 14, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (5)
src/Altinn.Broker.Core/Domain/ServiceOwnerEntity.cs (1)

9-9: Add XML documentation for the StorageProviders property.

Since this is a public property in a domain entity, adding XML documentation would improve maintainability and API documentation.

+    /// <summary>
+    /// Collection of storage providers configured for this service owner.
+    /// Each provider represents a different storage configuration with or without virus scanning.
+    /// </summary>
     public required List<StorageProviderEntity> StorageProviders { get; set; }
src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (4)

19-21: Consider externalizing configuration values

These hardcoded constants should be moved to configuration to allow tuning based on different environments and requirements. This is especially important for values that affect performance and resource usage.

Consider moving these to configuration:

-    private const int BLOCK_SIZE = 1024 * 1024 * 32; // 32MB
-    private const int BLOCKS_BEFORE_COMMIT = 1000;
-    private const int UPLOAD_THREADS = 3;
+    private readonly int _blockSize;
+    private readonly int _blocksBeforeCommit;
+    private readonly int _uploadThreads;

64-68: Consider using ArrayPool for buffer management

The current implementation creates new buffers for each block. Using ArrayPool could reduce memory allocations and garbage collection pressure.

-            var networkReadBuffer = new byte[1024 * 1024];
+            byte[] networkReadBuffer = ArrayPool<byte>.Shared.Rent(1024 * 1024);
+            try {
               // ... existing code ...
+            } finally {
+                ArrayPool<byte>.Shared.Return(networkReadBuffer);
+            }

213-215: Enhance retry logging

The current logging only includes the error message. Adding attempt number and wait duration would help with monitoring and debugging.

-                logger.LogWarning($"Error during retries: {ex.Message}");
+                logger.LogWarning(
+                    "Retry attempt {Attempt} after error. Waiting {WaitTime}s. Error: {Error}",
+                    attempt, timeSpan.TotalSeconds, ex.Message);

142-143: Use structured logging

Replace string interpolation with structured logging to improve log parsing and analysis.

-            logger.LogError("Error occurred while uploading file: {errorMessage}: {stackTrace} ", ex.Message, ex.StackTrace);
+            logger.LogError(ex, "Error occurred while uploading file {FileId} for {ServiceOwner}",
+                fileTransferEntity.FileTransferId, serviceOwnerEntity.Name);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c5ba8c5 and 50fa891.

📒 Files selected for processing (4)
  • src/Altinn.Broker.API/Models/ServiceOwner/ServiceOwnerOverviewExt.cs (1 hunks)
  • src/Altinn.Broker.Core/Domain/ServiceOwnerEntity.cs (1 hunks)
  • src/Altinn.Broker.Integrations/Azure/AzureStorageOptions.cs (1 hunks)
  • src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Altinn.Broker.API/Models/ServiceOwner/ServiceOwnerOverviewExt.cs
  • src/Altinn.Broker.Integrations/Azure/AzureStorageOptions.cs
🧰 Additional context used
📓 Learnings (1)
src/Altinn.Broker.Core/Domain/ServiceOwnerEntity.cs (1)
Learnt from: Ceredron
PR: Altinn/altinn-broker#582
File: src/Altinn.Broker.Core/Domain/ServiceOwnerEntity.cs:11-21
Timestamp: 2024-11-14T07:00:59.444Z
Learning: In the Altinn.Broker project, exceptions should not be used for control flow or logic; prefer returning null or appropriate results instead.
🔇 Additional comments (2)
src/Altinn.Broker.Core/Domain/ServiceOwnerEntity.cs (1)

11-21: Verify null handling in dependent code.

Since GetStorageProvider can return null in more scenarios now (empty list case), we should verify that all callers properly handle null returns.

✅ Verification successful

Null handling is properly implemented in all callers

All direct callers of GetStorageProvider have proper null checks in place:

  • InitializeFileTransferHandler.cs: Returns Errors.StorageProviderNotReady if null
  • UploadFileHandler.cs: Returns Errors.StorageProviderNotReady if null
  • AzureStorageService.cs: Uses the result in an async chain where null would propagate errors appropriately
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for direct calls to GetStorageProvider to verify null handling
rg -A 5 "GetStorageProvider\(" --type cs

Length of output: 2799

src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (1)

1-219: Implementation approved with security considerations

The implementation follows security best practices:

  • Proper error handling and cleanup
  • Secure blob storage access
  • Data integrity verification
  • Appropriate concurrency controls

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (14)
src/Altinn.Broker.Core/Options/GeneralSettings.cs (1)

Line range hint 1-6: Document configuration changes in README

Since this is part of a larger configuration restructuring (moving from ApplicationSettings to AzureStorageOptions), consider adding a migration guide or update the configuration documentation to reflect these changes.

#!/bin/bash
# Check if README or documentation files exist that might need updating
fd -e md -e txt README
tests/Altinn.Broker.Tests.LargeFile/PseudoRandomDataStream.cs (6)

2-4: Enhance class documentation with XML comments.

The current documentation is minimal. Consider adding comprehensive XML documentation that includes:

  • Purpose of the class
  • Usage examples
  • Parameters and their constraints
  • Performance characteristics
-/**
- * Pseudo-random generator
-*/
+/// <summary>
+/// A stream implementation that generates pseudo-random data for testing large file uploads.
+/// This stream simulates reading from a large file without consuming actual disk space.
+/// </summary>
+/// <remarks>
+/// Usage example:
+/// <code>
+/// using var stream = new PseudoRandomDataStream(1024 * 1024 * 100); // 100MB stream
+/// await UploadService.UploadFile(stream);
+/// </code>
+/// </remarks>

13-13: Remove unused field bytesRead.

The field bytesRead is declared but never used throughout the class.

-    long bytesRead = 0;

15-21: Improve timer configuration and initialization.

The timer configuration could be improved for better maintainability and safety:

  • Timer period is hardcoded
  • Timer callback isn't protected against object disposal
+    private const int PROGRESS_UPDATE_INTERVAL_MS = 1000;
+    private volatile bool _isDisposed;
+
     public PseudoRandomDataStream(long length)
     {
         _length = length;
         _position = 0;
         _rng = new XorShiftRandom();
-        _timer = new Timer(OnTimerElapsed, null, 1000, 1000);
+        _timer = new Timer(OnTimerElapsed, null, PROGRESS_UPDATE_INTERVAL_MS, PROGRESS_UPDATE_INTERVAL_MS);
     }

39-39: Remove unnecessary override of CanTimeout.

The CanTimeout override is unnecessary as it simply returns the base implementation.

-    public override bool CanTimeout => base.CanTimeout;

113-116: Improve random seed initialization.

Using system ticks as a seed might be predictable if multiple instances are created in quick succession. Consider using a more robust seed source.

     public XorShiftRandom()
     {
-        _state = (ulong)DateTime.UtcNow.Ticks;
+        var buffer = new byte[8];
+        System.Security.Cryptography.RandomNumberGenerator.Fill(buffer);
+        _state = BitConverter.ToUInt64(buffer, 0);
     }

23-27: Improve progress reporting mechanism.

The current implementation has several issues:

  • Direct console writes might not be appropriate for a utility class
  • Integer division could cause precision loss in speed calculation

Consider implementing a proper progress reporting mechanism:

+    public delegate void ProgressCallback(double percentComplete, double speedMiBps);
+    private readonly ProgressCallback _progressCallback;
+
+    public PseudoRandomDataStream(long length, ProgressCallback progressCallback = null)
+    {
+        _length = length;
+        _progressCallback = progressCallback;
+        // ... rest of constructor
+    }
+
     private void OnTimerElapsed(object state)
     {
         timerElapsedCount++;
-        Console.WriteLine($"Current position {(_position * 100.0 / _length):F3}%. Average speed so far is {(_position / timerElapsedCount) / (1024 * 1024)} MiB/s");
+        if (_progressCallback != null)
+        {
+            double percentComplete = (_position * 100.0) / _length;
+            double speedMiBps = (_position / (double)timerElapsedCount) / (1024 * 1024);
+            _progressCallback(percentComplete, speedMiBps);
+        }
     }
tests/Altinn.Broker.Tests.LargeFile/Program.cs (3)

24-25: Remove logging of potentially sensitive information

Consider removing or masking the logging of configuration values that might contain sensitive information in production environments.


75-75: Consider extracting hardcoded values to configuration

The URL, environment, and scopes are hardcoded. Consider moving these to configuration for better maintainability.


142-152: Document security implications of disabled virus scanning

The DisableVirusScan = true setting has security implications that should be documented. Also, consider documenting why Checksum is set to null.

Add XML documentation to explain the security considerations:

+    /// <summary>
+    /// Creates a basic file transfer configuration for testing large files.
+    /// </summary>
+    /// <remarks>
+    /// - Virus scanning is disabled to support large file uploads
+    /// - Checksum is set to null for performance in testing scenarios
+    /// </remarks>
     private static FileTransferInitalizeExt BasicFileTransfer() => new FileTransferInitalizeExt()
src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (4)

21-34: Consider making the network timeout configurable

The 24-hour network timeout is hardcoded. Consider moving this to the AzureStorageOptions for better configurability across different environments.

-            NetworkTimeout = TimeSpan.FromHours(24),
+            NetworkTimeout = TimeSpan.FromSeconds(azureStorageOptions.Value.NetworkTimeoutSeconds),

62-64: Optimize buffer sizes for better memory usage

The buffer sizes could be optimized:

  1. Initialize accumulationBuffer with the block size capacity
  2. Make networkReadBuffer size configurable
-            using var accumulationBuffer = new MemoryStream();
-            var networkReadBuffer = new byte[1024 * 1024];
+            using var accumulationBuffer = new MemoryStream(capacity: azureStorageOptions.Value.BlockSize);
+            var networkReadBuffer = new byte[azureStorageOptions.Value.NetworkBufferSize];

161-164: Use constant for HTTP status code

Replace magic number with a named constant for better maintainability.

-            if (blockResponse.GetRawResponse().Status != 201)
+            const int StatusCreated = 201;
+            if (blockResponse.GetRawResponse().Status != StatusCreated)

206-214: Make retry policy configurable

Consider moving retry configuration to AzureStorageOptions for better flexibility across environments.

-            3,
-            attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
+            azureStorageOptions.Value.MaxRetryAttempts,
+            attempt => TimeSpan.FromSeconds(Math.Pow(azureStorageOptions.Value.RetryBaseSeconds, attempt)),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6d56155 and 1fb3509.

📒 Files selected for processing (8)
  • .azure/modules/containerApp/main.bicep (1 hunks)
  • src/Altinn.Broker.API/Program.cs (1 hunks)
  • src/Altinn.Broker.API/appsettings.Development.json (1 hunks)
  • src/Altinn.Broker.Core/Options/GeneralSettings.cs (1 hunks)
  • src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (1 hunks)
  • src/Altinn.Broker.Persistence/Migrations/V0010__toggleable_virusscan.sql (1 hunks)
  • tests/Altinn.Broker.Tests.LargeFile/Program.cs (1 hunks)
  • tests/Altinn.Broker.Tests.LargeFile/PseudoRandomDataStream.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • .azure/modules/containerApp/main.bicep
  • src/Altinn.Broker.API/Program.cs
  • src/Altinn.Broker.API/appsettings.Development.json
  • src/Altinn.Broker.Persistence/Migrations/V0010__toggleable_virusscan.sql
🧰 Additional context used
📓 Learnings (1)
tests/Altinn.Broker.Tests.LargeFile/Program.cs (2)
Learnt from: Ceredron
PR: Altinn/altinn-broker#582
File: tests/Altinn.Broker.Tests.LargeFile/Program.cs:80-82
Timestamp: 2024-11-14T07:00:48.796Z
Learning: In test programs like `tests/Altinn.Broker.Tests.LargeFile/Program.cs` in the `Altinn.Broker` project, it's acceptable to proceed without checking the success of HTTP responses or adding additional error handling.
Learnt from: Ceredron
PR: Altinn/altinn-broker#582
File: tests/Altinn.Broker.Tests.LargeFile/Program.cs:147-147
Timestamp: 2024-11-14T07:01:04.663Z
Learning: In the project `Altinn.Broker.Tests.LargeFile`, the codebase uses C# 12 features like collection expressions, such as initializing collections with `PropertyList = []`. Do not flag this syntax as invalid in future reviews.
🔇 Additional comments (7)
src/Altinn.Broker.Core/Options/GeneralSettings.cs (1)

Line range hint 1-6: Verify removal of CorrespondenceBaseUrl property

The removal of CorrespondenceBaseUrl aligns with the shift towards Azure storage configurations. However, let's verify there are no remaining dependencies on this property.

✅ Verification successful

Removal of CorrespondenceBaseUrl property is safe

Based on the git history and codebase analysis:

  • The property was only defined in GeneralSettings.cs without any usage across the codebase
  • No references found in test files or documentation
  • The removal aligns with the PR's objective of streamlining configuration for Azure storage support
  • The property appears to have been unused since its introduction
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to CorrespondenceBaseUrl
# This will help identify if the property is still being used anywhere in the codebase

echo "Searching for direct property references..."
rg "CorrespondenceBaseUrl" --type cs

echo "Searching for configuration bindings in appsettings files..."
rg '"CorrespondenceBaseUrl"' --type json

Length of output: 307


Script:

#!/bin/bash
# Let's try a more comprehensive search strategy:

# 1. Search for any historical references in git history
echo "Checking git history for CorrespondenceBaseUrl..."
git log -p | rg -i "correspondencebaseurl"

# 2. Search for any partial matches or related terms
echo "Searching for partial matches..."
rg -i "correspondence" --type cs -A 2 -B 2

# 3. Check if there are any configuration files that might reference it
echo "Listing configuration files..."
fd "appsettings" --extension json --exec cat {}

Length of output: 461


Script:

#!/bin/bash
# Let's check the specific commit that removed this property to understand the context better
# and look for any related changes in tests or documentation

# 1. Find the commit that removed CorrespondenceBaseUrl
echo "Finding the commit that removed CorrespondenceBaseUrl..."
git log -p -- "**/GeneralSettings.cs" | rg -B 5 -A 5 "CorrespondenceBaseUrl"

# 2. Check for any test files that might have been modified in the same commit
echo "Checking for test files with correspondence references..."
fd "test" --type f --extension cs --exec rg -i "correspondence" {}

# 3. Look for any documentation changes
echo "Checking documentation files..."
fd . --extension md --exec rg -i "correspondence" {}

Length of output: 1020

tests/Altinn.Broker.Tests.LargeFile/Program.cs (3)

1-16: LGTM! Clean file structure with necessary imports.

The file is well-organized with appropriate imports and clear constant definitions.


22-22: ⚠️ Potential issue

Add validation for GIGABYTES_TO_UPLOAD parsing

Using int.Parse without validation could throw an exception if the environment variable contains invalid data.

Apply this diff:

-int gbsToUpload = Environment.GetEnvironmentVariable("GIGABYTES_TO_UPLOAD") is not null ? int.Parse(Environment.GetEnvironmentVariable("GIGABYTES_TO_UPLOAD")) : 100;
+int gbsToUpload = 100;
+if (int.TryParse(Environment.GetEnvironmentVariable("GIGABYTES_TO_UPLOAD"), out int parsedValue))
+{
+    gbsToUpload = parsedValue;
+}

Likely invalid or redundant comment.


44-44: Consider reducing the HTTP client timeout

A 48-hour timeout seems excessive and could lead to resource exhaustion. Consider using a more reasonable timeout (e.g., 2-4 hours) unless there's a specific requirement for such a long duration.

Let's verify if such long timeouts are needed based on the file sizes being tested:

✅ Verification successful

Based on the code analysis, I can now provide a final response regarding the 48-hour timeout concern:

The 48-hour timeout is justified for large file transfers

The code reveals this is a large file transfer test client specifically designed to handle uploads up to 100GB (configurable through GIGABYTES_TO_UPLOAD). The long timeout is necessary because:

  • The application supports file transfers up to 100GB (MaxFileUploadSize = 32L * 50000 * 1024 * 1024 in ApplicationConstants.cs)
  • The test client uses a modest buffer size of 1MB (BufferSize = 1024*1024*1)
  • Upload speeds can vary significantly in different environments

For perspective, even at a consistent 10MB/s upload speed, a 100GB file would take nearly 3 hours. The 48-hour timeout provides a safe margin for slower connections and potential network instabilities during large file transfers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any configuration or documentation mentioning file size limits
rg -i "file.*size|size.*limit" --type md

Length of output: 42


Script:

#!/bin/bash
# Let's search for file size related constants, variables, and configurations in the codebase
rg -i "filesize|file.*size|size.*limit" -A 3

# Also search for any test files or configurations mentioning large files
fd -e cs -e json -e config "large" --exec grep -l -i "file.*size|size.*limit" {} \;

# Let's also check the specific test file context
cat "tests/Altinn.Broker.Tests.LargeFile/Program.cs"

Length of output: 34286

src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (3)

19-20: LGTM: Clean dependency injection implementation

The constructor properly injects required dependencies using C# 12's primary constructor syntax.


36-50: LGTM: Efficient streaming download implementation

The download implementation correctly uses streaming and proper error handling.


52-54: Consider adding file size validation

Add validation for maximum file size to prevent potential DoS attacks through extremely large file uploads.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (3)

63-64: Consider configurable buffer sizes

The network read buffer size (1MB) is hardcoded. Consider moving this to AzureStorageOptions for better configurability across different network conditions.


99-102: Improve progress logging format

The current progress logging mixes GiB and MB/s units. Consider standardizing the units and adding percentage completion.

-                            var uploadSpeedMBps = position / (1024.0 * 1024) / (stopwatch.ElapsedMilliseconds / 1000.0);
-                            logger.LogInformation($"Uploaded block {blockList.Count}. Progress: " +
-                                $"{position / (1024.0 * 1024.0 * 1024.0):N2} GiB ({uploadSpeedMBps:N2} MB/s)");
+                            var uploadSpeedGBps = position / (1024.0 * 1024.0 * 1024.0) / (stopwatch.ElapsedMilliseconds / 1000.0);
+                            var percentComplete = (position * 100.0) / streamLength;
+                            logger.LogInformation($"Uploaded block {blockList.Count}. Progress: " +
+                                $"{position / (1024.0 * 1024.0 * 1024.0):N2} GiB ({percentComplete:N1}%) ({uploadSpeedGBps:N2} GB/s)");

204-217: Enhance retry policy configuration and logging

Consider the following improvements:

  1. Move retry configuration (attempts, base delay) to AzureStorageOptions
  2. Add more detailed logging including attempt number and next retry delay
 internal static class BlobRetryPolicy
 {
-    private static IAsyncPolicy RetryWithBackoff(ILogger logger) => Policy
+    private static IAsyncPolicy RetryWithBackoff(ILogger logger, AzureStorageOptions options) => Policy
         .Handle<Exception>()
         .WaitAndRetryAsync(
-            3,
-            attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
+            options.RetryAttempts,
+            attempt => TimeSpan.FromSeconds(Math.Pow(options.RetryBaseDelay, attempt)),
             (ex, timeSpan) => {
-                logger.LogWarning($"Error during retries: {ex.Message}");
+                logger.LogWarning(
+                    "Retry attempt {Attempt}/{MaxAttempts}. Next attempt in {Delay}s. Error: {Error}",
+                    attempt, options.RetryAttempts, timeSpan.TotalSeconds, ex.Message);
             }
         );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1fb3509 and 1aa3d2a.

📒 Files selected for processing (1)
  • src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (1 hunks)
🔇 Additional comments (3)
src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (3)

1-20: Well-structured class with proper dependency injection!

The class follows best practices with:

  • Clear separation of concerns
  • Proper dependency injection
  • Appropriate use of interfaces

27-31: Review the 24-hour network timeout setting

A 24-hour network timeout might be excessive and could lead to:

  • Hung connections consuming resources
  • Delayed error detection
  • Potential resource exhaustion

Consider implementing a more reasonable timeout (e.g., 1-2 hours) with the option to configure it through AzureStorageOptions.


187-201: LGTM! Robust delete implementation

The delete implementation:

  • Uses DeleteIfExists for idempotency
  • Has proper error handling
  • Includes informative error logging

Copy link
Collaborator

@Andreass2 Andreass2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@Andreass2
Copy link
Collaborator

From docs:
Would it not make sense for us to update the new maxFileTransferSize at the same time as approval? instead of customer having to do it themselves?

@Ceredron
Copy link
Collaborator Author

From docs: Would it not make sense for us to update the new maxFileTransferSize at the same time as approval? instead of customer having to do it themselves?

Been thinking some more about this, and I think the best would maybe be to never set the max limit on resource at all unless the customer specifically sets it themselves. Originally, we set it to 2GB because that is what the virus scan required. But that limit is better set in code as it is "our" limit, not the resource limit.

…restricted by block size so they are small by nature. No need for long timeout anymore.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (3)

19-20: Add constructor parameter validation

Consider adding null checks for the injected dependencies to fail fast if any required dependency is not properly configured.

 public class AzureStorageService(IResourceManager resourceManager, IOptions<AzureStorageOptions> azureStorageOptions, ILogger<AzureStorageService> logger) : IBrokerStorageService
 {
+    private readonly IResourceManager _resourceManager = resourceManager ?? throw new ArgumentNullException(nameof(resourceManager));
+    private readonly IOptions<AzureStorageOptions> _azureStorageOptions = azureStorageOptions ?? throw new ArgumentNullException(nameof(azureStorageOptions));
+    private readonly ILogger<AzureStorageService> _logger = logger ?? throw new ArgumentNullException(nameof(logger));

57-57: Consider making buffer size configurable

The network read buffer size is hardcoded to 1MB. Consider moving this to configuration to allow tuning based on different environments and scenarios.

-            var networkReadBuffer = new byte[1024 * 1024];
+            var networkReadBuffer = new byte[azureStorageOptions.Value.NetworkBufferSize];

94-95: Add log level configuration for progress updates

The current implementation logs progress for every block at Information level, which could flood the logs. Consider:

  1. Using Debug level for per-block progress
  2. Adding a configurable interval for Information level progress updates
-                            logger.LogInformation($"Uploaded block {blockList.Count}. Progress: " +
+                            logger.LogDebug($"Uploaded block {blockList.Count}. Progress: " +
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa3d2a and 44195a4.

📒 Files selected for processing (1)
  • src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (1 hunks)
🔇 Additional comments (3)
src/Altinn.Broker.Integrations/Azure/AzureStorageService.cs (3)

21-28: Previous comment about hardcoded container name still applies


30-44: Previous comments about stream handling and blob verification still apply


1-211: Implementation successfully addresses large file support requirements

The implementation effectively supports large file uploads by:

  1. Using block blobs with configurable block sizes
  2. Implementing concurrent uploads with proper memory management
  3. Including MD5 verification for data integrity
  4. Providing robust error handling and retry mechanisms

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/Altinn.Broker.API/Models/FileTransferOverviewExt.cs (1)

35-39: Consider enhancing the property with JsonPropertyName and detailed documentation.

The addition of the UseVirusScan property aligns well with the PR objectives for supporting large files with optional virus scanning. However, consider these improvements:

  1. Add JsonPropertyName attribute for consistency with other properties
  2. Enhance the documentation to include:
    • Default value behavior
    • Relationship with file size limits
    • Impact on file transfer process

Apply this diff to implement the suggestions:

     /// <summary>
-    /// Is virus scan enabled for the file transfer
+    /// Indicates whether virus scanning is enabled for this file transfer.
+    /// When disabled, larger files beyond the standard 2GB limit can be transferred.
+    /// This setting affects the maximum allowed file size and processing behavior.
     /// </summary>
+    [JsonPropertyName("useVirusScan")]
     public bool UseVirusScan { get; set; }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 44195a4 and 138a242.

📒 Files selected for processing (3)
  • src/Altinn.Broker.API/Mappers/FileStatusOverviewExtMapper.cs (1 hunks)
  • src/Altinn.Broker.API/Mappers/FileTransferStatusDetailsExtMapper.cs (1 hunks)
  • src/Altinn.Broker.API/Models/FileTransferOverviewExt.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Altinn.Broker.API/Mappers/FileTransferStatusDetailsExtMapper.cs
🔇 Additional comments (2)
src/Altinn.Broker.API/Mappers/FileStatusOverviewExtMapper.cs (2)

28-29: LGTM! Property reordering maintains functionality.

The reordering of ExpirationTime and Checksum properties is a clean change that preserves the mapping logic.


30-30: Consider adding null handling for UseVirusScan.

The UseVirusScan property mapping is crucial for the large file support feature. Consider adding null handling to ensure consistent behavior:

-            UseVirusScan = fileTransfer.UseVirusScan
+            UseVirusScan = fileTransfer.UseVirusScan ?? true // Default to true for safety

Let's verify the null handling in the domain model:

@Ceredron Ceredron merged commit 23f259c into main Nov 14, 2024
9 checks passed
@Ceredron Ceredron deleted the feat/big-file-batching branch November 14, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A distinct functionality or enhancement added to a product to provide specific value to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants