The SCM Inventory module is designed to automate the deployment of resources necessary for scanning SCM and pulling an inventory from such platforms. Initially it supports pullung GitHub organizations' repositories, their issues and pull requests to generate an inventory and maintain it.
The inventory includes by default additional information about the top 5 languages used in the repository as well as the top 5 topics used. This information can be customized to include additional data.
This Terraform module provisions an AWS EC2 instance, configures it with necessary permissions, and sets up a workflow to fetch GitHub inventory data and pushes it to an S3 bucket. The module is designed to be flexible and can be customized to support additional SCM platforms and data sources.
- GitHub: For more information see the python module github_inventory stored in this repository.
- AWS CLI configured with appropriate credentials
- Access to an AWS account with permissions to create EC2 instances, IAM roles, policies, and S3 buckets
- A GitHub token with permissions to access the repositories and organizations you wish to scan
Configure AWS Credentials
Ensure your AWS CLI is configured with credentials that have the necessary permissions to create the resources defined in this module.
Prepare GitHub Token
Store your GitHub token in AWS Secrets Manager. Note the ARN of the secret as it will be used in the Terraform variables.
Set Terraform Variables
Customize the Terraform variables defined in the variables.tf file or provide a terraform.tfvars file with your specific values.
We recommend setting the variables in a terraform.tfvars file based off the terraform.tfvars.example file provided.
Key variables include:
- aws_profile: The AWS profile to use for authentication.
- aws_region: The AWS region where resources will be deployed.
- s3_bucket_name: The name of the S3 bucket where the inventory will be stored. (This bucket must be created beforehand).
- github_token_secret_name: The ARN of the AWS Secrets Manager secret containing your GitHub token. This will have to be provisonned separately
- project_name: A name for your project.
- scanned_org: The GitHub organization you wish to scan.
Initialize Terraform
Run terraform init in the infrastructure/inventory/aws/scm-inventory/ directory to initialize the Terraform project.
Apply Terraform Configuration
Execute terraform apply to create the resources. Review the plan and confirm the action.
Access the Inventory
Once the EC2 instance completes its run, the generated inventory will be available in the specified S3 bucket. The instance can be configured to terminate automatically after completion.
Additional Notes
The EC2 instance will use a t2.micro
instance type by default, but this can be adjusted based on your needs. We didn't want to use a larger instance type by default to avoid unnecessary costs.
It is also possible to keep the EC2 running after the inventory generation, which can be useful for debugging purposes. This can be done by setting the terminate_instance_after_completion
variable to false
.
The module supports optional fetching of issues and pull requests from the scanned GitHub organizations by setting the fetch_issues and fetch_pr variables.
The inventory script is located in the scripts/inventory/github_inventory
directory.
For detailed information on the resources created and managed by this module, refer to the automatically generated documentation below.
Name | Version |
---|---|
terraform | >=1.7 |
aws | ~> 5.0 |
Name | Version |
---|---|
aws | ~> 5.0 |
local | n/a |
null | n/a |
No modules.
Name | Type |
---|---|
aws_iam_instance_profile.ec2_instance_profile | resource |
aws_iam_policy.permissions_for_ec2_instance | resource |
aws_iam_policy.s3_access_policy | resource |
aws_iam_role.ec2_role | resource |
aws_iam_role_policy_attachment.PermissionsForEC2InstancePolicyAttachment | resource |
aws_instance.ec2_inventory | resource |
aws_s3_object.poetry_dist | resource |
null_resource.poetry_build | resource |
aws_ami.amazon_ami | data source |
aws_caller_identity.current | data source |
aws_iam_policy_document.ec2_assume_role | data source |
aws_iam_policy_document.policy_document_permissions_for_ec2_instance | data source |
aws_iam_policy_document.s3_access_policy_document | data source |
aws_s3_bucket.resources_and_results | data source |
aws_secretsmanager_secret.github_token_secret | data source |
aws_security_group.default | data source |
aws_security_groups.custom_security_groups | data source |
aws_subnet.selected | data source |
aws_subnets.default | data source |
aws_vpc.selected | data source |
local_file.dist | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
ami_image_filter | Filter to use to find the Amazon Machine Image (AMI) to use for the EC2 instance the name can contain wildcards. Only GNU/Linux images are supported. | string |
"amzn2-ami-hvm*" |
no |
ami_owner | Owner of the Amazon Machine Image (AMI) to use for the EC2 instance | string |
"amazon" |
no |
aws_default_security_groups_filters | Filters to use to find the default security groups | list(string) |
[] |
no |
aws_profile | AWS profile to use for authentication | string |
n/a | yes |
aws_region | AWS region where to deploy resources | string |
"us-east-1" |
no |
ec2_workdir | Working directory for the EC2 instance | string |
"~/github-inventory" |
no |
environment_type | Environment (PRODUCTION, PRE-PRODUCTION, QUALITY ASSURANCE, INTEGRATION TESTING, DEVELOPMENT, LAB) | string |
"PRODUCTION" |
no |
fetch_issues | Indicates whether to fetch issues for the repositories | bool |
false |
no |
fetch_pr | Indicates whether to fetch pull requests for the repositories | bool |
false |
no |
github_token_secret_name | SSM parameter name containing the GitHub token of the Service Account | string |
n/a | yes |
instance_type | Instance type to use for fetching the inventory | string |
"t2.micro" |
no |
inventory_project_dir | Path to the directory containing the inventory project | string |
"../../../../scripts/inventory/github_inventory" |
no |
permissions_boundary_arn | Permissions boundary to use for the IAM role | string |
null |
no |
project_name | Name of the project | string |
"secrets-detection" |
no |
project_version | Version of the project | string |
"0.1.0" |
no |
s3_bucket_name | S3 bucket name where to upload the scripts and results | string |
n/a | yes |
scanned_org | Name of the organization to scan | string |
n/a | yes |
subnet_name | Filter to select the subnet to use, this can use wildcards. | string |
null |
no |
tags | A map of tags to add to the resources | map(string) |
{} |
no |
terminate_instance_after_completion | Indicates whether the instance should be terminated once the scan has finished (set to false for debugging purposes) | bool |
true |
no |
vpc_name | Filter to select the VPC to use, this can use wildcards. | string |
"" |
no |
Name | Description |
---|---|
ec2_instance_arn | n/a |
ec2_instance_id | n/a |
ec2_role_arn | n/a |