Skip to content

CharliesCodes/exploring-genes-from-expression-atlas

Repository files navigation

Life Science Computing

Datensatz

"Transcription profiling by high throughput sequencing in different developmental stages of Zea mays subsp. mays tissues"
Set: RNA-Seq mRNA baseline
Organism: Zea mays
Data Source: Expression Atlas EBI Datensatz

Zea Mays


Inhaltsverzeichnis

  1. Datensatz
  2. Inhaltsverzeichnis
  3. Preparation
    1. Imports
    2. Load Data
  4. Exploring and Processing the Experimental Data
    1. Print all Columns
    2. Check if Columns are equal
    3. Delete Data
    4. Exploring the Leftover-Data
    5. Zwischenfazit zu den Experiment-Daten 🏁
  5. Exploring and Processing the Results
    1. Print all Columns
    2. Check if Columns are equal
    3. Delete Data
    4. Exploring the Leftover-Data
    5. Zwischenfazit zu den Ergebnis-Daten 🏁
    6. Hauptkomponentenanalyse
    7. Ergebnisse der PCA 🏁

Preparation

Imports

import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd

Load Data

exp_design = pd.read_csv('E-MTAB-4342-experiment-design.tsv', sep='\t', header=0)
results = pd.read_csv('E-MTAB-4342-query-results.fpkms.tsv', sep='\t', header=4)

Exploring and Processing the Experimental Data

Print all Columns

exp_design.columns
Index(['Run', 'Sample Characteristic[cultivar]',
       'Sample Characteristic Ontology Term[cultivar]',
       'Sample Characteristic[developmental stage]',
       'Sample Characteristic Ontology Term[developmental stage]',
       'Sample Characteristic[organism]',
       'Sample Characteristic Ontology Term[organism]',
       'Sample Characteristic[organism part]',
       'Sample Characteristic Ontology Term[organism part]',
       'Factor Value[developmental stage]',
       'Factor Value Ontology Term[developmental stage]',
       'Factor Value[organism part]',
       'Factor Value Ontology Term[organism part]', 'Analysed'],
      dtype='object')

Check if Columns are equal

exp_design['Sample Characteristic Ontology Term[developmental stage]'].equals(exp_design['Factor Value Ontology Term[developmental stage]'])
True
exp_design['Sample Characteristic Ontology Term[organism part]'].equals(exp_design['Factor Value Ontology Term[organism part]'])
True
exp_design['Sample Characteristic[organism part]'].equals(exp_design['Factor Value[organism part]'])
True
exp_design['Sample Characteristic[developmental stage]'].equals(exp_design['Factor Value[developmental stage]'])
True

Delete Data

Delete all Columns with less than 2 different Values

for col in exp_design.columns:
    if len(exp_design[col].unique()) == 1:
        del exp_design[col]
exp_design.columns
Index(['Run', 'Sample Characteristic[developmental stage]',
       'Sample Characteristic Ontology Term[developmental stage]',
       'Sample Characteristic[organism part]',
       'Sample Characteristic Ontology Term[organism part]',
       'Factor Value[developmental stage]',
       'Factor Value Ontology Term[developmental stage]',
       'Factor Value[organism part]',
       'Factor Value Ontology Term[organism part]', 'Analysed'],
      dtype='object')

Delete double Columns

del exp_design['Factor Value Ontology Term[developmental stage]']
del exp_design['Factor Value Ontology Term[organism part]']
del exp_design['Factor Value[organism part]']
del exp_design['Factor Value[developmental stage]']

Delete Sample Characteristic Ontology Columns

del exp_design['Sample Characteristic Ontology Term[developmental stage]']
del exp_design['Sample Characteristic Ontology Term[organism part]']

Delete useless Cols

del exp_design['Analysed']
del exp_design['Run']
exp_design = exp_design.rename(columns={"Sample Characteristic[developmental stage]": "developmental stage", "Sample Characteristic[organism part]": "organism part"})
exp_design
developmental stage organism part
0 6 days after pollination leaf
1 6 days after pollination leaf
2 6 days after anthesis leaf
3 12 days after pollination leaf
4 12 days after pollination leaf
... ... ...
265 16 days after pollination endosperm
266 16 days after pollination plant embryo
267 16 days after pollination plant embryo
268 16 days after pollination plant embryo
269 18 days after pollination seed

270 rows × 2 columns

Exploring the Leftover-Data

print(exp_design["organism part"].nunique(), exp_design["developmental stage"].nunique())
44 32
exp_design.describe()
developmental stage organism part
count 267 270
unique 32 44
top 7 days after sowing seed
freq 24 33

Zwischenfazit zu den Experiment-Daten 🏁

Kultursorte

Um die Ergebnisse nicht aufgrund verschiedenen Genbestands zu verfĂ€lschen, wurde eine Inzuchtline der Sorte B73 gezĂŒchtet. Die Inzuchtline stellt sicher, dass der Genbestand aller Probenobjekte gleich ist.

Probenort / Organismus Teil

Die Expressionsrate sollte außerdem in verschiedenen Teilen des Organismus gemessen werden.
Es wurden 44 verschiedene Abschnitte gemessen. Unter Anderem:

  • Blatt
  • Internodium
  • verschiedene Pfahlwurzelzonen
  • Stele
  • Hauptwurzel
  • Perikarp
  • Seminalwurzel

Entwicklungsstadien đŸŒ±

Das eigentliche Ziel des Versuches war es, die Genexpressionsrate in AbhÀngigkeit von dem Entwicklungsstadium der Pflanze zu untersuchen. Es wurden 32 verschiedene Entwicklungsstadien untersucht.
Unter Anderem verschiedene ZeitabstÀnde nach:

  • BestĂ€ubung
  • Anthese
  • Aussaat

Außerdem wurde auch die Anzahl der sichbaren BlĂ€tter als Stadium eingeteilt und danach unterschieden.
ZusĂ€tzlich wurden berĂŒcksichtigt:

  • BlĂŒtephase der ganzen Pflanze
  • Stadium der Fruchtbildung der ganzen Pflanze in verschiedenen prozentualen Abschnitten

Exploring and Processing the Results FPKM (Fragments Per Kilobase Million)

Print all Columns

results
Gene ID Gene Name root, 3 days after sowing differentiation zone of primary root, 3 days after sowing meristematic zone and elongation zone, 3 days after sowing stele, 3 days after sowing cortical parenchyma of root, 3 days after sowing coleoptile, 6 days after sowing primary root, 6 days after sowing primary root, 7 days after sowing ... seed, 24 days after pollination plant embryo, 24 days after pollination endosperm, 24 days after pollination leaf, 30 days after pollination internode, 30 days after pollination thirteenth leaf, whole plant fruit formation stage 30 to 50% thirteenth leaf, whole plant flowering stage pre-pollination cob, whole plant flowering stage anthers, whole plant flowering stage silks, whole plant flowering stage
0 Zm00001eb000010 Zm00001eb000010 2.0 2.0 2.0 2.0 3.0 4.0 3.0 3.0 ... 2.0 5.0 0.7 10.0 6.0 6.0 10.0 7.0 4.0 4.0
1 Zm00001eb000020 Zm00001eb000020 24.0 15.0 38.0 19.0 11.0 38.0 31.0 20.0 ... 23.0 68.0 14.0 0.8 1.0 0.7 1.0 48.0 17.0 16.0
2 Zm00001eb000030 Zm00001eb000030 NaN NaN NaN NaN NaN 0.1 0.6 NaN ... 0.2 NaN NaN 0.1 NaN NaN NaN 0.2 NaN NaN
3 Zm00001eb000040 Zm00001eb000040 NaN NaN NaN 0.5 NaN NaN 0.5 NaN ... NaN NaN NaN 0.1 0.1 NaN NaN 0.3 NaN 0.1
4 Zm00001eb000050 Zm00001eb000050 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 0.2 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 Zm00001eb442870 Zm00001eb442870 NaN NaN NaN NaN NaN NaN NaN NaN ... 0.2 NaN 0.3 NaN NaN NaN NaN NaN NaN NaN
37840 Zm00001eb442890 Zm00001eb442890 NaN NaN NaN NaN NaN 0.2 NaN NaN ... 0.1 NaN NaN NaN NaN NaN NaN 0.1 NaN NaN
37841 Zm00001eb442910 Zm00001eb442910 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
37842 Zm00001eb442960 Zm00001eb442960 NaN NaN 0.2 NaN 0.2 1.0 0.8 NaN ... 0.4 1.0 0.1 0.4 0.2 0.1 NaN 0.6 0.5 0.5
37843 Zm00001eb443030 Zm00001eb443030 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

37844 rows × 94 columns

Check if Columns are equal

results['Gene ID'].equals(results['Gene Name'])
True

Delete Data

Delete double Columns

del results['Gene Name']
results
Gene ID root, 3 days after sowing differentiation zone of primary root, 3 days after sowing meristematic zone and elongation zone, 3 days after sowing stele, 3 days after sowing cortical parenchyma of root, 3 days after sowing coleoptile, 6 days after sowing primary root, 6 days after sowing primary root, 7 days after sowing root, 7 days after sowing ... seed, 24 days after pollination plant embryo, 24 days after pollination endosperm, 24 days after pollination leaf, 30 days after pollination internode, 30 days after pollination thirteenth leaf, whole plant fruit formation stage 30 to 50% thirteenth leaf, whole plant flowering stage pre-pollination cob, whole plant flowering stage anthers, whole plant flowering stage silks, whole plant flowering stage
0 Zm00001eb000010 2.0 2.0 2.0 2.0 3.0 4.0 3.0 3.0 3.0 ... 2.0 5.0 0.7 10.0 6.0 6.0 10.0 7.0 4.0 4.0
1 Zm00001eb000020 24.0 15.0 38.0 19.0 11.0 38.0 31.0 20.0 22.0 ... 23.0 68.0 14.0 0.8 1.0 0.7 1.0 48.0 17.0 16.0
2 Zm00001eb000030 NaN NaN NaN NaN NaN 0.1 0.6 NaN NaN ... 0.2 NaN NaN 0.1 NaN NaN NaN 0.2 NaN NaN
3 Zm00001eb000040 NaN NaN NaN 0.5 NaN NaN 0.5 NaN NaN ... NaN NaN NaN 0.1 0.1 NaN NaN 0.3 NaN 0.1
4 Zm00001eb000050 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 0.2 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 Zm00001eb442870 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.2 NaN 0.3 NaN NaN NaN NaN NaN NaN NaN
37840 Zm00001eb442890 NaN NaN NaN NaN NaN 0.2 NaN NaN NaN ... 0.1 NaN NaN NaN NaN NaN NaN 0.1 NaN NaN
37841 Zm00001eb442910 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
37842 Zm00001eb442960 NaN NaN 0.2 NaN 0.2 1.0 0.8 NaN 0.3 ... 0.4 1.0 0.1 0.4 0.2 0.1 NaN 0.6 0.5 0.5
37843 Zm00001eb443030 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

37844 rows × 93 columns

Exploring the Leftover-Data

Data: After Sowing

after_sowing = results.filter(
    regex='days after sowing')
after_sowing
root, 3 days after sowing differentiation zone of primary root, 3 days after sowing meristematic zone and elongation zone, 3 days after sowing stele, 3 days after sowing cortical parenchyma of root, 3 days after sowing coleoptile, 6 days after sowing primary root, 6 days after sowing primary root, 7 days after sowing root, 7 days after sowing seminal root, 7 days after sowing taproot zone 1, 7 days after sowing taproot zone 2, 7 days after sowing taproot zone 3, 7 days after sowing taproot zone 4, 7 days after sowing
0 2.0 2.0 2.0 2.0 3.0 4.0 3.0 3.0 3.0 2.0 3.0 4.0 4.0 4.0
1 24.0 15.0 38.0 19.0 11.0 38.0 31.0 20.0 22.0 27.0 47.0 29.0 23.0 5.0
2 NaN NaN NaN NaN NaN 0.1 0.6 NaN NaN NaN 0.1 NaN NaN NaN
3 NaN NaN NaN 0.5 NaN NaN 0.5 NaN NaN NaN 0.4 0.2 NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
37840 NaN NaN NaN NaN NaN 0.2 NaN NaN NaN NaN NaN NaN NaN NaN
37841 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
37842 NaN NaN 0.2 NaN 0.2 1.0 0.8 NaN 0.3 0.3 NaN NaN 0.2 NaN
37843 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

37844 rows × 14 columns

Calculate how many Genes where not expressed after sowing compared to Total

len_all = len(results.index)

after_sowing = after_sowing.dropna(axis=0, how='all')
len_sowing = len(after_sowing.index)
print(len_all- len_sowing)
7213

Count how many Genes where expressed after sowing by days after sowing and Organism Part

after_sowing.count()
root, 3 days after sowing                                     24582
differentiation zone of primary root, 3 days after sowing     24787
meristematic zone and elongation zone, 3 days after sowing    22893
stele, 3 days after sowing                                    23762
cortical parenchyma of root, 3 days after sowing              25025
coleoptile, 6 days after sowing                               25759
primary root, 6 days after sowing                             25631
primary root, 7 days after sowing                             26315
root, 7 days after sowing                                     26103
seminal root, 7 days after sowing                             25064
taproot zone 1, 7 days after sowing                           23977
taproot zone 2, 7 days after sowing                           25058
taproot zone 3, 7 days after sowing                           25998
taproot zone 4, 7 days after sowing                           26809
dtype: int64

Data: After Pollination

after_pollination = results.filter(
    regex='days after pollination')
after_pollination
seed, 2 days after pollination seed, 4 days after pollination leaf, 6 days after pollination internode, 6 days after pollination seed, 6 days after pollination seed, 8 days after pollination seed, 10 days after pollination leaf, 12 days after pollination internode, 12 days after pollination seed, 12 days after pollination ... seed, 22 days after pollination plant embryo, 22 days after pollination endosperm, 22 days after pollination leaf, 24 days after pollination internode, 24 days after pollination seed, 24 days after pollination plant embryo, 24 days after pollination endosperm, 24 days after pollination leaf, 30 days after pollination internode, 30 days after pollination
0 5.0 6.0 14.0 11.0 5.0 4.0 4.0 12.0 6.0 3.0 ... 2.0 5.0 0.9 9.0 7.0 2.0 5.0 0.7 10.0 6.0
1 42.0 40.0 0.9 2.0 30.0 29.0 28.0 0.6 4.0 41.0 ... 25.0 84.0 15.0 0.7 1.0 23.0 68.0 14.0 0.8 1.0
2 NaN 0.2 0.4 NaN 0.2 NaN 0.1 NaN NaN NaN ... 0.1 NaN NaN 0.3 0.6 0.2 NaN NaN 0.1 NaN
3 0.2 0.1 NaN NaN 0.1 0.1 0.1 0.1 NaN NaN ... NaN NaN NaN NaN 0.1 NaN NaN NaN 0.1 0.1
4 0.1 NaN NaN NaN 0.1 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 0.2 NaN NaN 0.2 NaN 0.3 NaN NaN
37840 0.2 NaN NaN NaN NaN NaN 0.2 NaN NaN NaN ... 0.2 NaN 0.3 NaN NaN 0.1 NaN NaN NaN NaN
37841 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
37842 2.0 3.0 0.4 0.3 3.0 2.0 2.0 0.2 NaN 0.8 ... 0.4 2.0 0.2 0.1 0.2 0.4 1.0 0.1 0.4 0.2
37843 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN 0.1 NaN NaN NaN NaN NaN NaN

37844 rows × 35 columns

Calculate how many Genes where not expressed after pollination compared to Total

len_all = len(results.index)

after_pollination = after_pollination.dropna(axis=0, how='all')
len_pollination = len(after_pollination.index)
print(len_all- len_pollination)
2077

Calculate how many more Genes where expressed compared to sowing

print(len_pollination - len_sowing)
5136

Count how many Genes where expressed after pollination by days after pollination and Organism Part

after_pollination.count()
seed, 2 days after pollination             26907
seed, 4 days after pollination             27090
leaf, 6 days after pollination             26035
internode, 6 days after pollination        25390
seed, 6 days after pollination             26935
seed, 8 days after pollination             27191
seed, 10 days after pollination            27919
leaf, 12 days after pollination            25413
internode, 12 days after pollination       26289
seed, 12 days after pollination            25672
endosperm, 12 days after pollination       24969
seed, 14 days after pollination            26669
endosperm, 14 days after pollination       23174
seed, 16 days after pollination            25538
plant embryo, 16 days after pollination    24076
endosperm, 16 days after pollination       22740
leaf, 18 days after pollination            25620
internode, 18 days after pollination       25385
seed, 18 days after pollination            26032
plant embryo, 18 days after pollination    25016
endosperm, 18 days after pollination       24444
pericarp, 18 days after pollination        25738
seed, 20 days after pollination            26429
plant embryo, 20 days after pollination    24794
endosperm, 20 days after pollination       23508
seed, 22 days after pollination            26463
plant embryo, 22 days after pollination    24485
endosperm, 22 days after pollination       23009
leaf, 24 days after pollination            25702
internode, 24 days after pollination       25442
seed, 24 days after pollination            26432
plant embryo, 24 days after pollination    25769
endosperm, 24 days after pollination       22742
leaf, 30 days after pollination            26485
internode, 30 days after pollination       25341
dtype: int64

Compare Genes from Leaf by Days

after_pollination.filter(
    regex='leaf').count()
leaf, 6 days after pollination     26035
leaf, 12 days after pollination    25413
leaf, 18 days after pollination    25620
leaf, 24 days after pollination    25702
leaf, 30 days after pollination    26485
dtype: int64

Compare Genes from Endosperm by Days

after_pollination.filter(
    regex='endosperm').count()
endosperm, 12 days after pollination    24969
endosperm, 14 days after pollination    23174
endosperm, 16 days after pollination    22740
endosperm, 18 days after pollination    24444
endosperm, 20 days after pollination    23508
endosperm, 22 days after pollination    23009
endosperm, 24 days after pollination    22742
dtype: int64

Zwischenfazit zu den Ergebnis-Daten 🏁

Nach der Aussaat

Die Messungen der Genexpression wurden an mehreren Tagen durchgefĂŒhrt. Jeweils 3, 6 und 7 Tage nach der Aussaat.

Die Probenentnahme 3 Tage nach der Aussaat erfolgte an folgenden Organismus-Teilen:

  • Wurzel
  • Differenzierungszone der PrimĂ€rwurzel
  • meristematische Zone und Streckungszone
  • Stele
  • Rindenparenchym der Wurzel

Am 6. Tag nach der Aussat wurden folgende Teile untersucht:

  • Koleoptile
  • PrimĂ€rwurzel

Einen weiteren Tag spÀter fanden die letzten Messungen dieses Entwicklungsstadiums in folgenden Teilen statt:

  • PrimĂ€rwurzel
  • Wurzel
  • Seminalwurzel
  • Pfahlwurzel Zonen 1 - 4

Nach der Aussaat wurden 7213 Gene weniger expremiert, verglichen mit allen Wachstumsstadien in Summe.
Mit Anzahl der Tage nach Aussaat stieg außerdem die Anzahl der expr. Gene.
Nach 3 Tagen lag die Anzahl bei 22893 - 25025, bei 6 Tagen bei 25631-25759 und bei 7 Tagen bei 23977-26809.
Es ist ein deutlicher Anstieg erkennbar. Unklar ist, ob dieser tatsĂ€chlich durch die Anzahl vergangener Tage und somit durch die Wachstumsphase bedingt ist, oder auf die unterschiedlichen Probenentnahmeorte zurĂŒckzufĂŒhren ist.
Zwei Proben wurden an verschiedenen Tagen in der selben Zone entnommen. Diese legen die Vermutung nahe, dass die höhere Anzahl der Tage nach Aussaat zu einer höheren Expremierung fĂŒhrt. Bei den Proben handelt es sich um die folgenden:

Tag PrimÀrwurzel Wurzel
3 24582
6 25631
7 26315 26103

Nach der BestÀubung

Die Messungen wurden nach der BestĂ€ubung jeweils im Abstand von 2 Tagen, bis Tag 30 durchgefĂŒhrt. Mit fortschreitendem Wachstumsstadium wurden die Messungen auch an verschiedenen Orten durchgefĂŒhrt. Getestet wurden z.B.:

  • Samen
  • Blatt
  • Internodium
  • Endosperm
  • Pflanzenembryo
  • Perikarp

Verglichen mit der Aussaat wurden 5136 Gene mehr exprimiert. Die Anzahl der exprimierten Gene scheint hier keinen direkten Zusammenhang mit der Anzahl vergangener Tage im Entwicklungsstadium zu haben. Dies lÀsst sich aber nicht sicher sagen, da an verschiedenen Orten die Proben entnommen wurden.
Die Proben der BlĂ€tter wurden im regelmĂ€ĂŸigen Abstand von 6 Tagen genommen. Die Daten zeigen hier, dass es keinen linearen Zusammenhang gibt.

Tage nach der BestÀubung Anzahl exprimierter Gene
6 26035
12 25413
18 25620
24 25702
30 26485

Um sicher zu gehen, dass dies nicht nur eine Eigenschaft der BlÀtter ist, habe ich zusÀtzlich die Daten des Endosperms tageweise gefiltert. Auch hier zeigt sich, dass es keinen linearen Zusammenhang gibt.

Tage nach der BestÀubung Anzahl exprimierter Gene
12 24969
14 23174
16 22740
18 24444
20 23508
22 23009
24 22742

import numpy as np
from pandas import DataFrame
import seaborn as sns
# %matplotlib inline
results = results.fillna(0)
results.set_index('Gene ID', inplace=True)
endosperm_after_pollination = after_pollination.filter(regex='endosperm')
endosperm_after_pollination.columns = endosperm_after_pollination.columns.str.replace("days after pollination", "")
endosperm_after_pollination.columns = endosperm_after_pollination.columns.str.replace("endosperm, ", "")


seed_after_pollination = after_pollination.filter(regex='seed')
seed_after_pollination.columns = seed_after_pollination.columns.str.replace("days after pollination", "")
seed_after_pollination.columns = seed_after_pollination.columns.str.replace("seed, ", "")


leaf_after_pollination = after_pollination.filter(regex='leaf')
leaf_after_pollination.columns = leaf_after_pollination.columns.str.replace("days after pollination", "")
leaf_after_pollination.columns = leaf_after_pollination.columns.str.replace("leaf, ", "")

internode_after_pollination = after_pollination.filter(regex='internode')
internode_after_pollination.columns = internode_after_pollination.columns.str.replace("days after pollination", "")
internode_after_pollination.columns = internode_after_pollination.columns.str.replace("internode, ", "")

plant_embryo_after_pollination = after_pollination.filter(regex='plant embryo')
plant_embryo_after_pollination.columns = plant_embryo_after_pollination.columns.str.replace("days after pollination", "")
plant_embryo_after_pollination.columns = plant_embryo_after_pollination.columns.str.replace("plant embryo, ", "")

pericarp_after_pollination = after_pollination.filter(regex='pericarp')
pericarp_after_pollination.columns = pericarp_after_pollination.columns.str.replace("days after pollination", "")
pericarp_after_pollination.columns = pericarp_after_pollination.columns.str.replace("pericarp, ", "")


seed_after_pollination
2 4 6 8 10 12 14 16 18 20 22 24
0 5.0 6.0 5.0 4.0 4.0 3.0 3.0 2.0 2.0 1.0 2.0 2.0
1 42.0 40.0 30.0 29.0 28.0 41.0 31.0 23.0 23.0 19.0 25.0 23.0
2 NaN 0.2 0.2 NaN 0.1 NaN 0.1 NaN NaN NaN 0.1 0.2
3 0.2 0.1 0.1 0.1 0.1 NaN NaN NaN 0.2 0.1 NaN NaN
4 0.1 NaN 0.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
37838 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
37839 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.2
37840 0.2 NaN NaN NaN 0.2 NaN 0.5 NaN 0.2 0.1 0.2 0.1
37842 2.0 3.0 3.0 2.0 2.0 0.8 1.0 0.3 0.5 0.4 0.4 0.4
37843 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

35767 rows × 12 columns

Hauptkomponentenanalyse

from sklearn.decomposition import PCA
from sklearn import preprocessing
import matplotlib.pyplot as plt
scaled_data = preprocessing.scale(results.T)
pca = PCA()
pca.fit(scaled_data)
pca_data = pca.transform(scaled_data)

per_var = np.round(pca.explained_variance_ratio_ * 100, decimals=1)
labels = ['PC' + str(x) for x in range(1, len(per_var)+1)]

plt.bar(x=range(1,len(per_var)+1), height=per_var, tick_label=labels)
plt.ylabel('Percentage of Explained Variance')
plt.xlabel('Principal Component')
plt.xticks(rotation=90)
plt.title('Scree Plot')
plt.show()

png

Ergebnis: Die ersten 2 Hauptkomponenten bilden nahezu ausschließlich die Varianzen ab

pca_df = pd.DataFrame(pca_data, columns=labels, index=results.columns)

plt.rcParams["figure.figsize"] = (15,15)
plt.scatter(pca_df.PC1, pca_df.PC2)
plt.title('PCA Graph')
plt.xlabel(f'PC1- {per_var[0]}%')
plt.ylabel(f'PC2- {per_var[1]}%')

pca_df
for sample in pca_df.index:
    plt.annotate(sample.partition(',')[0], (pca_df.PC1.loc[sample], pca_df.PC2.loc[sample]),  rotation=45)
    

plt.show()

png

Find Top 10 PCA seperating Genes

loading_scores = pd.Series(pca.components_[0], index=results.index)
sorted_loading_scores = loading_scores.abs().sort_values(ascending=False)
top_10_genes = sorted_loading_scores[0:10].index.values
loading_scores[top_10_genes]
Gene ID
Zm00001eb059170   -0.012067
Zm00001eb246940   -0.012066
Zm00001eb216140   -0.012039
Zm00001eb077390   -0.012038
Zm00001eb423880   -0.012003
Zm00001eb301640   -0.011932
Zm00001eb077380   -0.011899
Zm00001eb232720   -0.011895
Zm00001eb395490   -0.011878
Zm00001eb285560   -0.011875
dtype: float64

Show the Dataframe rows with the top 3 Genes to prove the PCA

df = results.loc[["Zm00001eb059170", "Zm00001eb246940", "Zm00001eb216140"]]
df
root, 3 days after sowing differentiation zone of primary root, 3 days after sowing meristematic zone and elongation zone, 3 days after sowing stele, 3 days after sowing cortical parenchyma of root, 3 days after sowing coleoptile, 6 days after sowing primary root, 6 days after sowing primary root, 7 days after sowing root, 7 days after sowing seminal root, 7 days after sowing ... seed, 24 days after pollination plant embryo, 24 days after pollination endosperm, 24 days after pollination leaf, 30 days after pollination internode, 30 days after pollination thirteenth leaf, whole plant fruit formation stage 30 to 50% thirteenth leaf, whole plant flowering stage pre-pollination cob, whole plant flowering stage anthers, whole plant flowering stage silks, whole plant flowering stage
Gene ID
Zm00001eb059170 10.0 10.0 15.0 11.0 9.0 13.0 8.0 11.0 11.0 14.0 ... 10.0 20.0 7.0 4.0 11.0 2.0 1.0 18.0 4.0 11.0
Zm00001eb246940 23.0 19.0 32.0 30.0 10.0 31.0 22.0 22.0 23.0 23.0 ... 30.0 50.0 26.0 9.0 24.0 5.0 6.0 48.0 10.0 35.0
Zm00001eb216140 6.0 3.0 7.0 5.0 3.0 9.0 6.0 5.0 5.0 5.0 ... 4.0 8.0 3.0 1.0 4.0 0.8 0.8 9.0 2.0 4.0

3 rows × 92 columns


Ergebnisse der PCA 🏁

Die ersten 2 Hauptkomponenten bilden nahezu ausschließlich die Varianzen ab (11-16% jeweils!) Alle anderen Hauptkomponenten zeigen Varianzen von unter 10%.

Die PCA zeigt deutliche Cluster der unterschiedlichen Pflanzenteile. Die Samenproben korrelieren deutlich miteinander. Auch die Endosperm- und Blatt-Proben bilden deutliche Cluster. Samen- und Endospermproben-Cluster liegen nahe bei einander, was suggeriert, dass diese Pflanzenteile sich Ă€hnlich in ihrer Genexpression verhalten. Diese Cluster unterscheiden sich deutlich von den Blattproben. Der große Clusterabstand deutet an, dass sich die Genexpression in den BlĂ€ttern deutlich anders verhĂ€lt, als bei den Samen. Generell trennen sich die Blattcluster von allen anderen Probenorten deutlich ab.

Nach Untersuchung welche Gene die Cluster am stÀrksten trennen kamen folgende Ergebnisse heraus:

Diese 10 Gene bilden den grĂ¶ĂŸten Expressionsunterschied laut Hauptkomponentenanalyse. Sie unterscheiden sich damit am stĂ€rksten in den BlĂ€ttern verglichen mit allen anderen Pflanzenteilen. Die Selektion dieser Spalten bestĂ€tigt die PCA zusĂ€tzlich. Die genannten Gene werden deutlich weniger innerhalb der BlĂ€tter als in allen anderen Pflanzenteilen exprimiert.