biom262 · spencerg27 · Jan 28, 2016
diff --git a/weeks/week03/homework.ipynb b/weeks/week03/homework.ipynb
@@ -23,7 +23,7 @@
    "cell_type": "raw",
    "metadata": {},
    "source": [
-    " "
+    " this is a one tailed test because we expect the difference to be in a specific direction (in this case higher level in A then B alleles)"
    ]
   },
   {
@@ -36,7 +36,9 @@
   {
    "cell_type": "raw",
    "metadata": {},
-   "source": []
+   "source": [
+    "in this case because the data appears to be non-parametric a test such as the wilcoxon sum ranked test would be most appropriate"
+   ]
   },
   {
    "cell_type": "markdown",
@@ -46,9 +48,60 @@
    ]
   },
   {
-   "cell_type": "raw",
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "\tShapiro-Wilk normality test\n",
+       "\n",
+       "data:  data_b$normal - data_c$normal\n",
+       "W = 0.97567, p-value = 0.9281\n"
+      ]
+     },
+     "execution_count": 25,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "\tWilcoxon rank sum test\n",
+       "\n",
+       "data:  data_b$normal and data_c$normal\n",
+       "W = 6, p-value = 0.06494\n",
+       "alternative hypothesis: true location shift is not equal to 0\n"
+      ]
+     },
+     "execution_count": 25,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data <- read.table(file = \"adamts_B6.txt\", header = TRUE)\n",
+    "\n",
+    "data_b <- subset(data, subset = nxf1 == \"B\")\n",
+    "data_c <- subset(data, subset = nxf1 == \"C\")\n",
+    "\n",
+    "shapiro.test(data_b$normal-data_c$normal)\n",
+    "\n",
+    "wilcox.test(x = data_b$normal, y = data_c$normal)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "metadata": {},
-   "source": []
+   "source": [
+    "Since the p value is larger then 0.05, we conclude that there is no sigificant difference between the alleles. "
+   ]
   },
   {
    "cell_type": "markdown",
@@ -60,9 +113,71 @@
    ]
   },
   {
-   "cell_type": "raw",
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Warning message:\n",
+      "In data_b$normal.f2 - data_c$normal.f2: longer object length is not a multiple of shorter object length"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "\tShapiro-Wilk normality test\n",
+       "\n",
+       "data:  data_b$normal.f2 - data_c$normal.f2\n",
+       "W = 0.81146, p-value = 0.002215\n"
+      ]
+     },
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "\tWelch Two Sample t-test\n",
+       "\n",
+       "data:  data_b$normal.f2 and data_c$normal.f2\n",
+       "t = -2.0766, df = 13.225, p-value = 0.02894\n",
+       "alternative hypothesis: true difference in means is less than 0\n",
+       "95 percent confidence interval:\n",
+       "          -Inf -0.0005380758\n",
+       "sample estimates:\n",
+       "  mean of x   mean of y \n",
+       "0.002613228 0.006241535 \n"
+      ]
+     },
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data <- read.table(file = \"adamts_balbF2.txt\", header = TRUE)\n",
+    "\n",
+    "data_b <- subset(data, subset = nxf1.f2 == \"B\")\n",
+    "data_c <- subset(data, subset = nxf1.f2 == \"C\")\n",
+    "\n",
+    "shapiro.test(data_b$normal.f2-data_c$normal.f2)\n",
+    "t.test(data_b$normal.f2, data_c$normal.f2, alternative = \"less\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "metadata": {},
-   "source": []
+   "source": [
+    "I conclude that since the data is normally distributed, a t-test is appropriate in this case. Furthermore, since we expect the B allele to be higher, a one-tailed test is also approriate. Given these conditions, I reject the null hypothesis and conclude that the B allele is significantly higher than C."
+   ]
   },
   {
    "cell_type": "markdown",
@@ -74,7 +189,9 @@
   {
    "cell_type": "raw",
    "metadata": {},
-   "source": []
+   "source": [
+    "I conclude that Nfx1 has an influcence on Gene 8 in a strain-specific fashion. The evidence is not overwhelming of an effect, then. However, the p value for B6 is very close to significant. Further experiments are validated."
+   ]
   },
   {
    "cell_type": "markdown",
@@ -86,7 +203,9 @@
   {
    "cell_type": "raw",
    "metadata": {},
-   "source": []
+   "source": [
+    "Parametric tests are more senstitive. This is why non-parametric tests use ranks."
+   ]
   },
   {
    "cell_type": "markdown",