Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
AyushSVarma committed Sep 11, 2020
0 parents commit 3094113
Show file tree
Hide file tree
Showing 12 changed files with 845 additions and 0 deletions.
183 changes: 183 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Created by https://www.gitignore.io/api/linux,macos,python,windows
# Edit at https://www.gitignore.io/?templates=linux,macos,python,windows

### Linux ###
*~

# temporary files which can be created if a process still has a handle open of a deleted file
.fuse_hidden*

# KDE directory preferences
.directory

# Linux trash folder which might appear on any partition or disk
.Trash-*

# .nfs files are created when an open file is removed but is still being accessed
.nfs*

### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon

# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# Mr Developer
.mr.developer.cfg
.project
.pydevproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

### Windows ###
# Windows thumbnail cache files
Thumbs.db
Thumbs.db:encryptable
ehthumbs.db
ehthumbs_vista.db

# Dump file
*.stackdump

# Folder config file
[Dd]esktop.ini

# Recycle Bin used on file shares
$RECYCLE.BIN/

# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp

# Windows shortcuts
*.lnk

# End of https://www.gitignore.io/api/linux,macos,python,windows

**/pre-processing-code/*/
**/pre-processing-code.zip

response.json
54 changes: 54 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Contributing to a Rearc AWS Data Exchange product

🎉🥳 First of all, **THANK YOU** for being interested in contributing to one of our projects 🎉🥳

### Table of Contents
- [What should I know before getting started?](#what-should-i-know-before-getting-started)
* [What is AWS Data Exchange?](#what-is-aws-data-exchange)
* [Who is Rearc?](#who-is-rearc)
* [What are Rearc's goals for ADX?](#what-are-rearcs-goals-for-adx)
* [What is Rearc's philosophy towards dataset formats?](#what-is-rearcs-philosophy-towards-dataset-formats)
* [What tools are you using throughout your ADX Products?](#what-tools-are-you-using-throughout-your-adx-products)
- [How can I contribute?](#how-can-i-contribute)
* [Report an Issue/Bug or Submit an Improvement/Suggestion](#report-an-issuebug-or-submit-an-improvementsuggestion)
* [Pull Request](#pull-request)
- [Additional Resources](#additional-resources)

## What should I know before getting started?

#### What Is AWS Data Exchange?
> [AWS Data Exchange](https://aws.amazon.com/data-exchange/) is a data marketplace that makes it easy for AWS customers to securely find, subscribe to, and use third-party data in the cloud.
#### Who is Rearc?
[Rearc](https://www.rearc.io) is a data provider and one of the launch partners for AWS Data Exchange. Products published by Rearc on ADX can be found [here](https://aws.amazon.com/marketplace/seller-profile?id=a8a86da2-b2d1-4fae-992d-03494e90590b). On ADX we automate the (1) sourcing, (2) transformation, (3) creation, (4) revisions and, (5) publishing of datasets through ADX.

#### What are Rearc's goals for ADX?
We at Rearc are working tirelessly to lend greater accessibility to interesting and/or important datasets across various disciplines and sources. We realize the direct integration of the ADX, along with other AWS services, facilitates a convenient manner for our subscribers to consume data. For data providers we can supply an automation pipeline, leveraging the AWS platform, to ensure the ubiquity of your data for your consumers.

#### What is Rearc's philosophy towards dataset formats?
We try as much as possible to preserve the integrity of data we provide through ADX, and most of the time this means delivering datasets exactly as they were presented from their source. Sometimes we make minor alterations to datasets to provide wider usability for ADX subscribers (e.g. adjusting CSV files for SQL column naming conventions). For situations where we are unable to maintain the original data file format, we try to limit the extent of transformations as much as possible.

#### What tools are you using throughout your ADX Products?
- Our ADX products are primarily built with [Python 3](https://www.python.org), and use AWS [CloudFormation](https://docs.aws.amazon.com/cloudformation/) and [Lambda](https://docs.aws.amazon.com/lambda/) resources to offer automated revisions.
- As no two datasets are the same, the exact tools utilized vary on a project-by-project basis.

For more details on the technologies used in our ADX products, please visit [Getting started with publishing a data product on AWS Data Exchange](https://github.com/rearc-data/publish-a-data-product-on-aws-data-exchange).

## How can I contribute?

#### Report an Issue/Bug or Submit an Improvement/Suggestion
If you have feedback specific to the ADX product featured in this repository, the best way to contact us would be through [opening a GitHub issue]() in this repository. Before opening an issue please review the existing suggestions to see if your idea is already there. If already present, please comment on the existing issue instead of making a new one.

When opening an issue please **be as descriptive as possible**. If relevant please **provide information regarding your use-case, development configuration and environment**. The more specific you can be the easier it will be for us to identify and address the situation.

If you have a general inquiry about Rearc's data services you can send an email to data@rearc.io. We would love to hear any suggestion, question or request you may have.

#### Pull Request
We actively encourage you to fork, branch and open a pull request on this repository! Before opening a pull request please familiarize yourself with the [tools](#what-tools-are-you-using-throughout-your-adx-products) used in our ADX Products. If you are looking to improve the project's included datasets you should direct yourself to the [`pre-processing/pre-processing-code`](./pre-processing/pre-processing-code) folder, as this is where the gathering and transforming of data occurs.

When you are ready to open a pull request, please **be as descriptive as possible** regarding all improvements you have made. After reviewing your pull request, we may ask you to complete additional changes before your pull request is accepted. If we are unable to accept your pull request, we will make sure to offer context for our decision.

## Additional Resources
- [Rearc Data Homepage](https://www.rearc.io/data)
- Rearc Data Email: data@rearc.io
- [Rearc AWS Marketplace Profile](https://aws.amazon.com/marketplace/seller-profile?id=a8a86da2-b2d1-4fae-992d-03494e90590b)
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<a href="https://www.rearc.io/data/">
<img src="./rearc_logo_rgb.png" alt="Rearc Logo" title="Rearc Logo" height="52" />
</a>

#

You can subscribe to the AWS Data Exchange product utilizing the automation featured in this repository by visiting []().

## Main Overview

#### Data Source

## More Information
- Source:
-
-
- Frequency:
- Formats:

## Contact Details
- If you find any issues with or have enhancement ideas for this product, open up a GitHub [issue]() and we will gladly take a look at it. Better yet, submit a pull request. Any contributions you make are greatly appreciated :heart:.
- If you are looking for specific open datasets currently not available on ADX, please submit a request on our project board [here]().
- If you have questions about the source data, please contact .
- If you have any other questions or feedback, send us an email at data@rearc.io.

## About Rearc
Rearc is a cloud, software and services company. We believe that empowering engineers drives innovation. Cloud-native architectures, modern software and data practices, and the ability to safely experiment can enable engineers to realize their full potential. We have partnered with several enterprises and startups to help them achieve agility. Our approach is simple — empower engineers with the best tools possible to make an impact within their industry.
Empty file added dataset-description.md
Empty file.
Loading

0 comments on commit 3094113

Please sign in to comment.