Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency behavior of the EncryptionClient wrt the regular client about end of stream behavior #300

Open
NathanEckert opened this issue Jun 19, 2024 · 4 comments

Comments

@NathanEckert
Copy link

NathanEckert commented Jun 19, 2024

Problem:

When using ranged queries with the encryption client, I get a different behavior than with the regular S3 client.
It is not clear there which one is incorrect:

  • the encryption client is consistent with java.io.InputStream#read(byte[], int, int) implementation and javadoc
  • the encryption client is not consistent with the behavior with the SDK v1
package com.test.aws;

import java.io.InputStream;
import java.nio.ByteBuffer;
import java.security.KeyFactory;
import java.security.KeyPair;
import java.security.PrivateKey;
import java.security.PublicKey;
import java.security.spec.PKCS8EncodedKeySpec;
import java.security.spec.X509EncodedKeySpec;
import java.time.Duration;
import java.util.Base64;
import org.junit.jupiter.api.Test;
import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
import software.amazon.awssdk.core.ResponseInputStream;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.http.apache.ApacheHttpClient;
import software.amazon.awssdk.http.async.SdkAsyncHttpClient;
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.DeleteObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import software.amazon.encryption.s3.S3EncryptionClient;

public class TestEndOfStreamBehavior {
	private static final Region DEFAULT_REGION = AwsTestUtil.DEFAULT_REGION;
	private static final String BUCKET = AwsTestUtil.AWS_TEST_BUCKET;
	private static final String KEY = "filename.txt";
	private static final byte[] CONTENT = "abcdefghijklmnopqrstuvwxyz0123456789".repeat(4).getBytes();

	/** The encryption key to use in client-side encryption tests. */
	protected static final KeyPair KEY_PAIR;


	static {
		final String publicKeyString = "yourPublicKey";
		final String privateKeyString = "yourPrivateKey";
		try {
			final KeyFactory factory = KeyFactory.getInstance("RSA");
			final PublicKey publicKey =
					factory.generatePublic(
							new X509EncodedKeySpec(Base64.getDecoder().decode(publicKeyString.getBytes())));
			final PrivateKey privateKey =
					factory.generatePrivate(
							new PKCS8EncodedKeySpec(Base64.getDecoder().decode(privateKeyString.getBytes())));
			KEY_PAIR = new KeyPair(publicKey, privateKey);
		} catch (Exception e) {
			throw new RuntimeException(e);
		}
	}

	@Test
	void testEndOfStreamBehavior() throws Exception {

		// Pick the client to use, inconsistent behavior between the two
		final S3Client client = getClient(DEFAULT_REGION);
		// final S3Client client = getEncryptionClient(KEY_PAIR, DEFAULT_REGION);

		// Delete the data if it exists
		final DeleteObjectRequest deleteRequest = DeleteObjectRequest.builder()
				.bucket(BUCKET)
				.key(KEY)
				.build();

		client.deleteObject(deleteRequest);

		// Upload the data
		final PutObjectRequest uploadRequest =
				PutObjectRequest.builder().bucket(BUCKET).key(KEY).build();
		client.putObject(uploadRequest, RequestBody.fromBytes(CONTENT));
		// wait for the data to be uploaded
		Thread.sleep(Duration.ofSeconds(5));

		// Actual test

		final GetObjectRequest downloadRequest =
				GetObjectRequest.builder()
						.bucket(BUCKET)
						.key(KEY)
						.range("bytes=0-15")
						.build();

		final InputStream stream = client.getObject(downloadRequest);

		// Buffer capacity matters !!!
		// Behavior difference when the capacity is same as the content length (i.e. 16) of the ranged query
		final ByteBuffer buffer = ByteBuffer.allocate(16);
		final byte[] underlyingBuffer = buffer.array();
		final int capacity = buffer.capacity();

		final int END_OF_STREAM = -1;
		int byteRead = 0;
		int startPosition = 0;
		while (byteRead != END_OF_STREAM) {
			int lenToRead = capacity - startPosition;
			System.out.println("Start position: " + startPosition + " Length to read: " + lenToRead);
                        // Difference of behavior here when reading the end of stream with a 0 lenToRead
			byteRead = stream.read(underlyingBuffer, startPosition, lenToRead);
			System.out.println("Read " + byteRead + " bytes");
			startPosition += byteRead;
			if (byteRead == 0) {
				System.out.println("Looping indefinitely with an encryption client, as startPosition is not increasing");
				break;
			}
		}
	}



	public static S3Client getEncryptionClient(final KeyPair keyPair, final Region region) {

		return S3EncryptionClient.builder()
				.rsaKeyPair(keyPair)
				.enableLegacyUnauthenticatedModes(true)
				.wrappedClient(getClient(region))
				.wrappedAsyncClient(getAsyncClient(region))
				.build();
	}


	public static S3Client getClient(final Region region) {

		return S3Client.builder()
				.region(region)
				.credentialsProvider(DefaultCredentialsProvider.create())
				.httpClientBuilder(
						ApacheHttpClient.builder().maxConnections(128) // Default is 50
				)
				.build();
	}


	public static S3AsyncClient getAsyncClient(final Region region) {

		final SdkAsyncHttpClient nettyHttpClient =
				NettyNioAsyncHttpClient.builder().maxConcurrency(100).build();

		return S3AsyncClient.builder()
				.region(region)
				.credentialsProvider(DefaultCredentialsProvider.create())
				.httpClient(nettyHttpClient)
				.build();
	}
}

I do not know which behavior is the correct one, but i wanted to report it as it is an inconsistency and a bug that can arise while migrating to the new SDK.

@texastony
Copy link
Contributor

@NathanEckert
Thank you for cutting the issue.

I ran your code; I cannot get your results.
Glance at the comments I put in this commit.

I had a hard time aligning your issue description with your code comments.
Your comments mention an infinite for loop;
I cannot re-create that.

I cannot detect any difference in behavior between
the S3EC V3 Client and the S3 V2 Client with respect to this code.

This repo is for S3 Encryption Client for Java V3 (S3EC-Java-3.x),
which only supports the AWS SDK for Java V2.

My understanding is that the AWS SDK for Java changed
streaming operations between V1 and V2,
as described here in the Developer Guidance
.

The objective of the S3EC-Java-3.x is to have API compatibility with the
AWS SDK for Java V2's S3 Client,
not any predecessors.

From what you have described,
I infer that the S3EC-Java-3.x's Streaming behavior is consistent with the AWS SDK for Java V2's S3 client,
but inconsistent with the older clients.

Is this correct? Is this newer vs older client behavior the root cause of this issue?

If not, can you tell us exactly what version of the AWS SDK for Java you are using,
and which version of the S3 Encryption Client?

Much Obliged,
AWS Crypto Tools

(Pending response from @NathanEckert)

@NathanEckert
Copy link
Author

I am using the same version as you.
I created a fully contained project to reproduce the issue: https://github.com/NathanEckert/AWS_crypto_tools_reproducer

texastony added a commit that referenced this issue Jun 24, 2024
The modern S3EC and SDK V2 treat reading 0 bytes
from a stream differently in at least one case:
If the Stream has no more content.
texastony added a commit that referenced this issue Jun 24, 2024
The modern S3EC and SDK V2 treat reading 0 bytes
from a stream differently in at least one case:
If the Stream has no more content.
texastony added a commit that referenced this issue Jun 24, 2024
The modern S3EC and SDK V2 treat reading 0 bytes
from a stream differently in at least one case:
If the Stream has no more content.
texastony added a commit that referenced this issue Jun 24, 2024
The modern S3EC and SDK V2 treat reading 0 bytes
from a stream differently in at least one case:
If the Stream has no more content.
@texastony
Copy link
Contributor

@NathanEckert thank you for that reproduction.

We probably were hitting your error, but the log statement was not emitting.
I have created a branch with a GitHub workflow explicitly for this issue.
#307

I am seeing other issues as well.
We will try to determine what behavior we want,
and either document it or fix it to be like the SDK's client.

Much Obliged,
Crypto Tools

@kessplas
Copy link
Contributor

Upon further investigation, this behavior is due to the implementation of the AWS SDK's InputStreamSubscriber. We have filed a bug with the SDK team; once the fix is published, we'll update the pom.xml of the S3EC accordingly. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants