Skip to content

Commit

Permalink
Changes after sphinx run
Browse files Browse the repository at this point in the history
  • Loading branch information
tirthajyoti committed Jul 22, 2019
1 parent 4826d17 commit b71b1d7
Show file tree
Hide file tree
Showing 8 changed files with 659 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,7 @@ venv.bak/

# mypy
.mypy_cache/

_build
_static
_templates
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Include the license file
include LICENSE.txt
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# DOEPY (`pip install doepy`)
---
![doe-1](https://raw.githubusercontent.com/tirthajyoti/doepy/master/images/doe_1.PNG)
#### Authored and maiantained by Dr. Tirthajyoti Sarkar, Fremont, CA 94536 (https://tirthajyoti.github.io)

## Introduction
[Design of Experiment (DOE)](https://en.wikipedia.org/wiki/Design_of_experiments) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis. This exercise has become **critical in this age of rapidly expanding field of data science and associated statistical modeling and machine learning**. A well-planned DOE can give a researcher meaningful data set to act upon with optimal number of experiments preserving critical resources.

Expand All @@ -16,6 +21,8 @@ Need for careful design of experiment arises in all fields of serious scientific
### Options for open-source DOE builder package in Python?
Unfortunately, majority of the state-of-the-art DOE generators are part of commercial statistical software packages like [JMP (SAS)](https://www.jmp.com/) or [Minitab](www.minitab.com/en-US/default.aspx). However, a researcher will surely be benefited if there exists an open-source code which presents an intuitive user interface for generating an experimental design plan from a simple list of input variables. There are a couple of DOE builder Python packages but individually they don’t cover all the necessary DOE methods and they lack a simplified user API, where one can just input a CSV file of input variables’ range and get back the DOE matrix in another CSV file.

---

## Features
This set of codes is a collection of functions which wrap around the core packages (mentioned below) and generate **design-of-experiment (DOE) matrices** for a statistician or engineer from an arbitrary range of input variables.

Expand Down Expand Up @@ -43,6 +50,8 @@ In this way, ***the only API user needs to be exposed to, are input and output C
* Halton sequence based,
* Uniform random matrix

---

## How to use it?
### What supporitng packages are required?
First make sure you have all the necessary packages installed. You can simply run the .bash (Unix/Linux) and .bat (Windows) files provided in the repo, to install those packages from your command line interface. They contain the following commands,
Expand Down Expand Up @@ -108,6 +117,8 @@ read_write.write_csv(df_lhs,filename=filename)

You should see a `lhs.csv` file in your directory.

---

## Acknowledgements and Requirements
The code was written in Python 3.7. It uses following external packages that needs to be installed on your system to use it,
* pydoe: A package designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs. [Check the docs here](https://pythonhosted.org/pyDOE/).
Expand Down
266 changes: 266 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
=======
DOEPY
=======
----------------------------------------------------------------------
A Python package for easily generating design of experiment tables
----------------------------------------------------------------------
.. image:: https://raw.githubusercontent.com/tirthajyoti/doepy/master/images/doe_1.PNG

Authored and maiantained by `Dr. Tirthajyoti Sarkar <https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/>`_, Fremont, California.

Check my website: https://tirthajyoti.github.io

Introduction
------------

`Design of Experiment
(DOE) <https://en.wikipedia.org/wiki/Design_of_experiments>`__ is an
important activity for any scientist, engineer, or statistician planning
to conduct experimental analysis. This exercise has become **critical in
this age of rapidly expanding field of data science and associated
statistical modeling and machine learning**. A well-planned DOE can give
a researcher meaningful data set to act upon with optimal number of
experiments preserving critical resources.

After all, aim of Data Science is essentially to conduct highest
quality scientific investigation and modeling with real world data.
And to do good science with data, one needs to collect it through
carefully thought-out experiment to cover all corner cases and
reduce any possible bias.

What is a scientific experiment?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In its simplest form, a scientific experiment aims at predicting the
outcome by introducing a change of the preconditions, which is
represented by one or more `independent
variables <https://en.wikipedia.org/wiki/Dependent_and_independent_variables>`__,
also referred to as “input variables” or “predictor variables.” The
change in one or more independent variables is generally hypothesized to
result in a change in one or more `dependent
variables <https://en.wikipedia.org/wiki/Dependent_and_independent_variables>`__,
also referred to as “output variables” or “response variables.” The
experimental design may also identify `control
variables <https://en.wikipedia.org/wiki/Controlling_for_a_variable>`__
that must be held constant to prevent external factors from affecting
the results.

What is Experimental Design?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Experimental design involves not only the selection of suitable
independent, dependent, and control variables, but planning the delivery
of the experiment under statistically optimal conditions given the
constraints of available resources. There are multiple approaches for
determining the set of design points (unique combinations of the
settings of the independent variables) to be used in the experiment.

Main concerns in experimental design include the establishment of
`validity <https://en.wikipedia.org/wiki/Validity_%28statistics%29>`__,
`reliability <https://en.wikipedia.org/wiki/Reliability_%28statistics%29>`__,
and `replicability <https://en.wikipedia.org/wiki/Reproducibility>`__.
For example, these concerns can be partially addressed by carefully
choosing the independent variable, reducing the risk of measurement
error, and ensuring that the documentation of the method is sufficiently
detailed. Related concerns include achieving appropriate levels of
`statistical power <https://en.wikipedia.org/wiki/Statistical_power>`__
and
`sensitivity <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>`__.

Need for careful design of experiment arises in all fields of serious
scientific, technological, and even social science
investigation — \ *computer science, physics, geology, political
science, electrical engineering, psychology, business marketing
analysis, financial analytics*, etc…

Options for open-source DOE builder package in Python?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Unfortunately, majority of the state-of-the-art DOE generators are part
of commercial statistical software packages like `JMP
(SAS) <https://www.jmp.com/>`__ or
`Minitab <www.minitab.com/en-US/default.aspx>`__. However, a researcher
will surely be benefited if there exists an open-source code which
presents an intuitive user interface for generating an experimental
design plan from a simple list of input variables. There are a couple of
DOE builder Python packages but individually they don’t cover all the
necessary DOE methods and they lack a simplified user API, where one can
just input a CSV file of input variables’ range and get back the DOE
matrix in another CSV file.

--------------

Features
--------

This set of codes is a collection of functions which wrap around the
core packages (mentioned below) and generate **design-of-experiment
(DOE) matrices** for a statistician or engineer from an arbitrary range
of input variables.

Limitation of the foundation packages used
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Both the core packages, which act as foundations to this repo, are not
complete in the sense that they do not cover all the necessary functions
to generate DOE table that a design engineer may need while planning an
experiment. Also, they offer only low-level APIs in the sense that the
standard output from them are normalized numpy arrays. It was felt that
users, who may not be comfortable in dealing with Python objects
directly, should be able to take advantage of their functionalities
through a simplified user interface.

Simplified user interface
~~~~~~~~~~~~~~~~~~~~~~~~~

**User just needs to provide a simple CSV file with a single table of
variables and their ranges (2-level i.e. min/max or 3-level).** Some of
the functions work with 2-level min/max range while some others need
3-level ranges from the user (low-mid-high). Intelligence is built into
the code to handle the case if the range input is not appropriate and to
generate levels by simple linear interpolation from the given input. The
code will generate the DOE as per user's choice and write the matrix in
a CSV file on to the disk.

In this way, **the only API user needs to be exposed to, are input and
output CSV files. These files then can be used in any engineering
simulator, software, process-control module, or fed into process
equipments.**

Designs available
~~~~~~~~~~~~~~~~~

- Full factorial,
- 2-level fractional factorial,
- Plackett-Burman,
- Sukharev grid,
- Box-Behnken,
- Box-Wilson (Central-composite) with center-faced option,
- Box-Wilson (Central-composite) with center-inscribed option,
- Box-Wilson (Central-composite) with center-circumscribed option,
- Latin hypercube (simple),
- Latin hypercube (space-filling),
- Random k-means cluster,
- Maximin reconstruction,
- Halton sequence based,
- Uniform random matrix

--------------

How to use it?
--------------

What supporitng packages are required?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First make sure you have all the necessary packages installed. You can
simply run the .bash (Unix/Linux) and .bat (Windows) files provided in
the repo, to install those packages from your command line interface.
They contain the following commands,

::

pip install numpy
pip install pandas
pip install pydoe
pip install diversipy

How to install the package?
~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can pip install the package!

``pip install doepy``

Quick start
~~~~~~~~~~~

Let's say you have a design problem with the following table for the
parameters range. Imagine this as a generic example of a checmical
process in a manufacturing plant. You have 3 levels of ``Pressure``, 3
levels of ``Temperature``, 2 levels of ``FlowRate``, and 2 levels of
``Time``.

| ``Pressure``: 40/55/70
| ``Temperature``: 290/320/350
| ``FlowRate``: 0.2/0.4
| ``Time``: 5/8
First, import ``build`` module from the package,

``from doepy import build``

| Then, try a simple example by building a **full factorial design**. We will use ``build.full_fact()`` function for this. You have to pass a dictionary object to the function which encodes your experimental data.
::

build.full_fact({'Pressure':[40,55,70],'Temperature':[290, 320, 350],
'Flow rate':[0.2,0.4], 'Time':[5,8]})

If you build a full-factorial DOE out of this, you should get a table with 3 x 3 x 2 x 2 = 36 entries.

Other functions to try on
~~~~~~~~~~~~~~~~~~~~~~~~~

Try other functions like ``build.space_filling_lhs()`` to construct a
`space-filling Latin hypercube
design <https://en.wikipedia.org/wiki/Latin_hypercube_sampling>`__.

Or try from one of the following available design options...

- Full factorial: ``build.full_fact()``
- 2-level fractional factorial: ``build.frac_fact_res()``
- Plackett-Burman: ``build.plackett_burman()``
- Sukharev grid: ``build.sukharev()``
- Box-Behnken: ``build.box_behnken()``
- Box-Wilson (Central-composite) with center-faced option: ``build.central_composite()`` with ``face='ccf'`` option
- Box-Wilson (Central-composite) with center-inscribed option: ``build.central_composite()`` with ``face='cci'`` option
- Box-Wilson (Central-composite) with center-circumscribed option: ``build.central_composite()`` with ``face='ccc'`` option
- Latin hypercube (simple): ``build.lhs()``
- Latin hypercube (space-filling): ``build.space_filling_lhs()``
- Random k-means cluster: ``build.random_k_means()``
- Maximin reconstruction: ``build.maximin()``
- Halton sequence based: ``build.halton()``
- Uniform random matrix: ``build.uniform_random()``

Read from and write to CSV files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Internally, you pass on a dictionary object and get back a Pandas
DataFrame. But, for reading from and writing to CSV files, you have to
use the ``read_write`` module of the package.

::

from doepy import read_write
data_in=read_write.read_variables_csv('../Data/params.csv')

Then you can use this ``data_in`` object in the DOE generating
functions.

For writing back to a CSV,

::

df_lhs=build.space_filling_lhs(data_in,num_samples=100)
filename = 'lhs'
read_write.write_csv(df_lhs,filename=filename)

You should see a ``lhs.csv`` file in your directory.

--------------

Acknowledgements and Requirements
---------------------------------

The code was written in Python 3.7. It uses following external packages
that needs to be installed on your system to use it,

- ``pydoe``: A package designed to help the scientist, engineer,
statistician, etc., to construct appropriate experimental designs.
`Check the docs here <https://pythonhosted.org/pyDOE/>`__.
- ``diversipy``: A collection of algorithms for sampling in hypercubes,
selecting diverse subsets, and measuring diversity. `Check the docs
here <https://www.simonwessing.de/diversipy/doc/>`__.
- ``numpy``
- ``pandas``
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
55 changes: 55 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# http://www.sphinx-doc.org/en/master/config

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = 'doepy'
copyright = '2019, Tirthajyoti Sarkar'
author = 'Tirthajyoti Sarkar'

# The full version, including alpha/beta/rc tags
release = '0.0.1'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
Loading

0 comments on commit b71b1d7

Please sign in to comment.