"Transcription profiling by high throughput sequencing in different developmental stages of Zea mays subsp. mays tissues"
Set: RNA-Seq mRNA baseline
Organism: Zea mays
Data Source: Expression Atlas EBI Datensatz
- Datensatz
- Inhaltsverzeichnis
- Preparation
- Exploring and Processing the Experimental Data
- Exploring and Processing the Results
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
exp_design = pd.read_csv('E-MTAB-4342-experiment-design.tsv', sep='\t', header=0)
results = pd.read_csv('E-MTAB-4342-query-results.fpkms.tsv', sep='\t', header=4)
exp_design.columns
Index(['Run', 'Sample Characteristic[cultivar]',
'Sample Characteristic Ontology Term[cultivar]',
'Sample Characteristic[developmental stage]',
'Sample Characteristic Ontology Term[developmental stage]',
'Sample Characteristic[organism]',
'Sample Characteristic Ontology Term[organism]',
'Sample Characteristic[organism part]',
'Sample Characteristic Ontology Term[organism part]',
'Factor Value[developmental stage]',
'Factor Value Ontology Term[developmental stage]',
'Factor Value[organism part]',
'Factor Value Ontology Term[organism part]', 'Analysed'],
dtype='object')
exp_design['Sample Characteristic Ontology Term[developmental stage]'].equals(exp_design['Factor Value Ontology Term[developmental stage]'])
True
exp_design['Sample Characteristic Ontology Term[organism part]'].equals(exp_design['Factor Value Ontology Term[organism part]'])
True
exp_design['Sample Characteristic[organism part]'].equals(exp_design['Factor Value[organism part]'])
True
exp_design['Sample Characteristic[developmental stage]'].equals(exp_design['Factor Value[developmental stage]'])
True
for col in exp_design.columns:
if len(exp_design[col].unique()) == 1:
del exp_design[col]
exp_design.columns
Index(['Run', 'Sample Characteristic[developmental stage]',
'Sample Characteristic Ontology Term[developmental stage]',
'Sample Characteristic[organism part]',
'Sample Characteristic Ontology Term[organism part]',
'Factor Value[developmental stage]',
'Factor Value Ontology Term[developmental stage]',
'Factor Value[organism part]',
'Factor Value Ontology Term[organism part]', 'Analysed'],
dtype='object')
del exp_design['Factor Value Ontology Term[developmental stage]']
del exp_design['Factor Value Ontology Term[organism part]']
del exp_design['Factor Value[organism part]']
del exp_design['Factor Value[developmental stage]']
del exp_design['Sample Characteristic Ontology Term[developmental stage]']
del exp_design['Sample Characteristic Ontology Term[organism part]']
del exp_design['Analysed']
del exp_design['Run']
exp_design = exp_design.rename(columns={"Sample Characteristic[developmental stage]": "developmental stage", "Sample Characteristic[organism part]": "organism part"})
exp_design
developmental stage | organism part | |
---|---|---|
0 | 6 days after pollination | leaf |
1 | 6 days after pollination | leaf |
2 | 6 days after anthesis | leaf |
3 | 12 days after pollination | leaf |
4 | 12 days after pollination | leaf |
... | ... | ... |
265 | 16 days after pollination | endosperm |
266 | 16 days after pollination | plant embryo |
267 | 16 days after pollination | plant embryo |
268 | 16 days after pollination | plant embryo |
269 | 18 days after pollination | seed |
270 rows Ă 2 columns
print(exp_design["organism part"].nunique(), exp_design["developmental stage"].nunique())
44 32
exp_design.describe()
developmental stage | organism part | |
---|---|---|
count | 267 | 270 |
unique | 32 | 44 |
top | 7 days after sowing | seed |
freq | 24 | 33 |
Um die Ergebnisse nicht aufgrund verschiedenen Genbestands zu verfĂ€lschen, wurde eine Inzuchtline der Sorte B73 gezĂŒchtet. Die Inzuchtline stellt sicher, dass der Genbestand aller Probenobjekte gleich ist.
Die Expressionsrate sollte auĂerdem in verschiedenen Teilen des Organismus gemessen werden.
Es wurden 44 verschiedene Abschnitte gemessen. Unter Anderem:
- Blatt
- Internodium
- verschiedene Pfahlwurzelzonen
- Stele
- Hauptwurzel
- Perikarp
- Seminalwurzel
Das eigentliche Ziel des Versuches war es, die Genexpressionsrate in AbhÀngigkeit von dem Entwicklungsstadium der Pflanze zu untersuchen. Es wurden 32 verschiedene Entwicklungsstadien untersucht.
Unter Anderem verschiedene ZeitabstÀnde nach:
- BestÀubung
- Anthese
- Aussaat
AuĂerdem wurde auch die Anzahl der sichbaren BlĂ€tter als Stadium eingeteilt und danach unterschieden.
ZusĂ€tzlich wurden berĂŒcksichtigt:
- BlĂŒtephase der ganzen Pflanze
- Stadium der Fruchtbildung der ganzen Pflanze in verschiedenen prozentualen Abschnitten
results
Gene ID | Gene Name | root, 3 days after sowing | differentiation zone of primary root, 3 days after sowing | meristematic zone and elongation zone, 3 days after sowing | stele, 3 days after sowing | cortical parenchyma of root, 3 days after sowing | coleoptile, 6 days after sowing | primary root, 6 days after sowing | primary root, 7 days after sowing | ... | seed, 24 days after pollination | plant embryo, 24 days after pollination | endosperm, 24 days after pollination | leaf, 30 days after pollination | internode, 30 days after pollination | thirteenth leaf, whole plant fruit formation stage 30 to 50% | thirteenth leaf, whole plant flowering stage | pre-pollination cob, whole plant flowering stage | anthers, whole plant flowering stage | silks, whole plant flowering stage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Zm00001eb000010 | Zm00001eb000010 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 4.0 | 3.0 | 3.0 | ... | 2.0 | 5.0 | 0.7 | 10.0 | 6.0 | 6.0 | 10.0 | 7.0 | 4.0 | 4.0 |
1 | Zm00001eb000020 | Zm00001eb000020 | 24.0 | 15.0 | 38.0 | 19.0 | 11.0 | 38.0 | 31.0 | 20.0 | ... | 23.0 | 68.0 | 14.0 | 0.8 | 1.0 | 0.7 | 1.0 | 48.0 | 17.0 | 16.0 |
2 | Zm00001eb000030 | Zm00001eb000030 | NaN | NaN | NaN | NaN | NaN | 0.1 | 0.6 | NaN | ... | 0.2 | NaN | NaN | 0.1 | NaN | NaN | NaN | 0.2 | NaN | NaN |
3 | Zm00001eb000040 | Zm00001eb000040 | NaN | NaN | NaN | 0.5 | NaN | NaN | 0.5 | NaN | ... | NaN | NaN | NaN | 0.1 | 0.1 | NaN | NaN | 0.3 | NaN | 0.1 |
4 | Zm00001eb000050 | Zm00001eb000050 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 0.2 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | Zm00001eb442870 | Zm00001eb442870 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.2 | NaN | 0.3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37840 | Zm00001eb442890 | Zm00001eb442890 | NaN | NaN | NaN | NaN | NaN | 0.2 | NaN | NaN | ... | 0.1 | NaN | NaN | NaN | NaN | NaN | NaN | 0.1 | NaN | NaN |
37841 | Zm00001eb442910 | Zm00001eb442910 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37842 | Zm00001eb442960 | Zm00001eb442960 | NaN | NaN | 0.2 | NaN | 0.2 | 1.0 | 0.8 | NaN | ... | 0.4 | 1.0 | 0.1 | 0.4 | 0.2 | 0.1 | NaN | 0.6 | 0.5 | 0.5 |
37843 | Zm00001eb443030 | Zm00001eb443030 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37844 rows Ă 94 columns
results['Gene ID'].equals(results['Gene Name'])
True
del results['Gene Name']
results
Gene ID | root, 3 days after sowing | differentiation zone of primary root, 3 days after sowing | meristematic zone and elongation zone, 3 days after sowing | stele, 3 days after sowing | cortical parenchyma of root, 3 days after sowing | coleoptile, 6 days after sowing | primary root, 6 days after sowing | primary root, 7 days after sowing | root, 7 days after sowing | ... | seed, 24 days after pollination | plant embryo, 24 days after pollination | endosperm, 24 days after pollination | leaf, 30 days after pollination | internode, 30 days after pollination | thirteenth leaf, whole plant fruit formation stage 30 to 50% | thirteenth leaf, whole plant flowering stage | pre-pollination cob, whole plant flowering stage | anthers, whole plant flowering stage | silks, whole plant flowering stage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Zm00001eb000010 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 4.0 | 3.0 | 3.0 | 3.0 | ... | 2.0 | 5.0 | 0.7 | 10.0 | 6.0 | 6.0 | 10.0 | 7.0 | 4.0 | 4.0 |
1 | Zm00001eb000020 | 24.0 | 15.0 | 38.0 | 19.0 | 11.0 | 38.0 | 31.0 | 20.0 | 22.0 | ... | 23.0 | 68.0 | 14.0 | 0.8 | 1.0 | 0.7 | 1.0 | 48.0 | 17.0 | 16.0 |
2 | Zm00001eb000030 | NaN | NaN | NaN | NaN | NaN | 0.1 | 0.6 | NaN | NaN | ... | 0.2 | NaN | NaN | 0.1 | NaN | NaN | NaN | 0.2 | NaN | NaN |
3 | Zm00001eb000040 | NaN | NaN | NaN | 0.5 | NaN | NaN | 0.5 | NaN | NaN | ... | NaN | NaN | NaN | 0.1 | 0.1 | NaN | NaN | 0.3 | NaN | 0.1 |
4 | Zm00001eb000050 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 0.2 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | Zm00001eb442870 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.2 | NaN | 0.3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37840 | Zm00001eb442890 | NaN | NaN | NaN | NaN | NaN | 0.2 | NaN | NaN | NaN | ... | 0.1 | NaN | NaN | NaN | NaN | NaN | NaN | 0.1 | NaN | NaN |
37841 | Zm00001eb442910 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37842 | Zm00001eb442960 | NaN | NaN | 0.2 | NaN | 0.2 | 1.0 | 0.8 | NaN | 0.3 | ... | 0.4 | 1.0 | 0.1 | 0.4 | 0.2 | 0.1 | NaN | 0.6 | 0.5 | 0.5 |
37843 | Zm00001eb443030 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37844 rows Ă 93 columns
after_sowing = results.filter(
regex='days after sowing')
after_sowing
root, 3 days after sowing | differentiation zone of primary root, 3 days after sowing | meristematic zone and elongation zone, 3 days after sowing | stele, 3 days after sowing | cortical parenchyma of root, 3 days after sowing | coleoptile, 6 days after sowing | primary root, 6 days after sowing | primary root, 7 days after sowing | root, 7 days after sowing | seminal root, 7 days after sowing | taproot zone 1, 7 days after sowing | taproot zone 2, 7 days after sowing | taproot zone 3, 7 days after sowing | taproot zone 4, 7 days after sowing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 4.0 | 3.0 | 3.0 | 3.0 | 2.0 | 3.0 | 4.0 | 4.0 | 4.0 |
1 | 24.0 | 15.0 | 38.0 | 19.0 | 11.0 | 38.0 | 31.0 | 20.0 | 22.0 | 27.0 | 47.0 | 29.0 | 23.0 | 5.0 |
2 | NaN | NaN | NaN | NaN | NaN | 0.1 | 0.6 | NaN | NaN | NaN | 0.1 | NaN | NaN | NaN |
3 | NaN | NaN | NaN | 0.5 | NaN | NaN | 0.5 | NaN | NaN | NaN | 0.4 | 0.2 | NaN | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37840 | NaN | NaN | NaN | NaN | NaN | 0.2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37841 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37842 | NaN | NaN | 0.2 | NaN | 0.2 | 1.0 | 0.8 | NaN | 0.3 | 0.3 | NaN | NaN | 0.2 | NaN |
37843 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37844 rows Ă 14 columns
len_all = len(results.index)
after_sowing = after_sowing.dropna(axis=0, how='all')
len_sowing = len(after_sowing.index)
print(len_all- len_sowing)
7213
after_sowing.count()
root, 3 days after sowing 24582
differentiation zone of primary root, 3 days after sowing 24787
meristematic zone and elongation zone, 3 days after sowing 22893
stele, 3 days after sowing 23762
cortical parenchyma of root, 3 days after sowing 25025
coleoptile, 6 days after sowing 25759
primary root, 6 days after sowing 25631
primary root, 7 days after sowing 26315
root, 7 days after sowing 26103
seminal root, 7 days after sowing 25064
taproot zone 1, 7 days after sowing 23977
taproot zone 2, 7 days after sowing 25058
taproot zone 3, 7 days after sowing 25998
taproot zone 4, 7 days after sowing 26809
dtype: int64
after_pollination = results.filter(
regex='days after pollination')
after_pollination
seed, 2 days after pollination | seed, 4 days after pollination | leaf, 6 days after pollination | internode, 6 days after pollination | seed, 6 days after pollination | seed, 8 days after pollination | seed, 10 days after pollination | leaf, 12 days after pollination | internode, 12 days after pollination | seed, 12 days after pollination | ... | seed, 22 days after pollination | plant embryo, 22 days after pollination | endosperm, 22 days after pollination | leaf, 24 days after pollination | internode, 24 days after pollination | seed, 24 days after pollination | plant embryo, 24 days after pollination | endosperm, 24 days after pollination | leaf, 30 days after pollination | internode, 30 days after pollination | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5.0 | 6.0 | 14.0 | 11.0 | 5.0 | 4.0 | 4.0 | 12.0 | 6.0 | 3.0 | ... | 2.0 | 5.0 | 0.9 | 9.0 | 7.0 | 2.0 | 5.0 | 0.7 | 10.0 | 6.0 |
1 | 42.0 | 40.0 | 0.9 | 2.0 | 30.0 | 29.0 | 28.0 | 0.6 | 4.0 | 41.0 | ... | 25.0 | 84.0 | 15.0 | 0.7 | 1.0 | 23.0 | 68.0 | 14.0 | 0.8 | 1.0 |
2 | NaN | 0.2 | 0.4 | NaN | 0.2 | NaN | 0.1 | NaN | NaN | NaN | ... | 0.1 | NaN | NaN | 0.3 | 0.6 | 0.2 | NaN | NaN | 0.1 | NaN |
3 | 0.2 | 0.1 | NaN | NaN | 0.1 | 0.1 | 0.1 | 0.1 | NaN | NaN | ... | NaN | NaN | NaN | NaN | 0.1 | NaN | NaN | NaN | 0.1 | 0.1 |
4 | 0.1 | NaN | NaN | NaN | 0.1 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.2 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | 0.2 | NaN | NaN | 0.2 | NaN | 0.3 | NaN | NaN |
37840 | 0.2 | NaN | NaN | NaN | NaN | NaN | 0.2 | NaN | NaN | NaN | ... | 0.2 | NaN | 0.3 | NaN | NaN | 0.1 | NaN | NaN | NaN | NaN |
37841 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37842 | 2.0 | 3.0 | 0.4 | 0.3 | 3.0 | 2.0 | 2.0 | 0.2 | NaN | 0.8 | ... | 0.4 | 2.0 | 0.2 | 0.1 | 0.2 | 0.4 | 1.0 | 0.1 | 0.4 | 0.2 |
37843 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 0.1 | NaN | NaN | NaN | NaN | NaN | NaN |
37844 rows Ă 35 columns
len_all = len(results.index)
after_pollination = after_pollination.dropna(axis=0, how='all')
len_pollination = len(after_pollination.index)
print(len_all- len_pollination)
2077
print(len_pollination - len_sowing)
5136
after_pollination.count()
seed, 2 days after pollination 26907
seed, 4 days after pollination 27090
leaf, 6 days after pollination 26035
internode, 6 days after pollination 25390
seed, 6 days after pollination 26935
seed, 8 days after pollination 27191
seed, 10 days after pollination 27919
leaf, 12 days after pollination 25413
internode, 12 days after pollination 26289
seed, 12 days after pollination 25672
endosperm, 12 days after pollination 24969
seed, 14 days after pollination 26669
endosperm, 14 days after pollination 23174
seed, 16 days after pollination 25538
plant embryo, 16 days after pollination 24076
endosperm, 16 days after pollination 22740
leaf, 18 days after pollination 25620
internode, 18 days after pollination 25385
seed, 18 days after pollination 26032
plant embryo, 18 days after pollination 25016
endosperm, 18 days after pollination 24444
pericarp, 18 days after pollination 25738
seed, 20 days after pollination 26429
plant embryo, 20 days after pollination 24794
endosperm, 20 days after pollination 23508
seed, 22 days after pollination 26463
plant embryo, 22 days after pollination 24485
endosperm, 22 days after pollination 23009
leaf, 24 days after pollination 25702
internode, 24 days after pollination 25442
seed, 24 days after pollination 26432
plant embryo, 24 days after pollination 25769
endosperm, 24 days after pollination 22742
leaf, 30 days after pollination 26485
internode, 30 days after pollination 25341
dtype: int64
after_pollination.filter(
regex='leaf').count()
leaf, 6 days after pollination 26035
leaf, 12 days after pollination 25413
leaf, 18 days after pollination 25620
leaf, 24 days after pollination 25702
leaf, 30 days after pollination 26485
dtype: int64
after_pollination.filter(
regex='endosperm').count()
endosperm, 12 days after pollination 24969
endosperm, 14 days after pollination 23174
endosperm, 16 days after pollination 22740
endosperm, 18 days after pollination 24444
endosperm, 20 days after pollination 23508
endosperm, 22 days after pollination 23009
endosperm, 24 days after pollination 22742
dtype: int64
Die Messungen der Genexpression wurden an mehreren Tagen durchgefĂŒhrt. Jeweils 3, 6 und 7 Tage nach der Aussaat.
Die Probenentnahme 3 Tage nach der Aussaat erfolgte an folgenden Organismus-Teilen:
- Wurzel
- Differenzierungszone der PrimÀrwurzel
- meristematische Zone und Streckungszone
- Stele
- Rindenparenchym der Wurzel
Am 6. Tag nach der Aussat wurden folgende Teile untersucht:
- Koleoptile
- PrimÀrwurzel
Einen weiteren Tag spÀter fanden die letzten Messungen dieses Entwicklungsstadiums in folgenden Teilen statt:
- PrimÀrwurzel
- Wurzel
- Seminalwurzel
- Pfahlwurzel Zonen 1 - 4
Nach der Aussaat wurden 7213 Gene weniger expremiert, verglichen mit allen Wachstumsstadien in Summe.
Mit Anzahl der Tage nach Aussaat stieg auĂerdem die Anzahl der expr. Gene.
Nach 3 Tagen lag die Anzahl bei 22893 - 25025, bei 6 Tagen bei 25631-25759 und bei 7 Tagen bei 23977-26809.
Es ist ein deutlicher Anstieg erkennbar. Unklar ist, ob dieser tatsĂ€chlich durch die Anzahl vergangener Tage und somit durch die Wachstumsphase bedingt ist, oder auf die unterschiedlichen Probenentnahmeorte zurĂŒckzufĂŒhren ist.
Zwei Proben wurden an verschiedenen Tagen in der selben Zone entnommen. Diese legen die Vermutung nahe, dass die höhere Anzahl der Tage nach Aussaat zu einer höheren Expremierung fĂŒhrt.
Bei den Proben handelt es sich um die folgenden:
Tag | PrimÀrwurzel | Wurzel |
---|---|---|
3 | 24582 | |
6 | 25631 | |
7 | 26315 | 26103 |
Die Messungen wurden nach der BestĂ€ubung jeweils im Abstand von 2 Tagen, bis Tag 30 durchgefĂŒhrt. Mit fortschreitendem Wachstumsstadium wurden die Messungen auch an verschiedenen Orten durchgefĂŒhrt. Getestet wurden z.B.:
- Samen
- Blatt
- Internodium
- Endosperm
- Pflanzenembryo
- Perikarp
Verglichen mit der Aussaat wurden 5136 Gene mehr exprimiert.
Die Anzahl der exprimierten Gene scheint hier keinen direkten Zusammenhang mit der Anzahl vergangener Tage im Entwicklungsstadium zu haben. Dies lÀsst sich aber nicht sicher sagen, da an verschiedenen Orten die Proben entnommen wurden.
Die Proben der BlĂ€tter wurden im regelmĂ€Ăigen Abstand von 6 Tagen genommen. Die Daten zeigen hier, dass es keinen linearen Zusammenhang gibt.
Tage nach der BestÀubung | Anzahl exprimierter Gene |
---|---|
6 | 26035 |
12 | 25413 |
18 | 25620 |
24 | 25702 |
30 | 26485 |
Um sicher zu gehen, dass dies nicht nur eine Eigenschaft der BlÀtter ist, habe ich zusÀtzlich die Daten des Endosperms tageweise gefiltert. Auch hier zeigt sich, dass es keinen linearen Zusammenhang gibt.
Tage nach der BestÀubung | Anzahl exprimierter Gene |
---|---|
12 | 24969 |
14 | 23174 |
16 | 22740 |
18 | 24444 |
20 | 23508 |
22 | 23009 |
24 | 22742 |
import numpy as np
from pandas import DataFrame
import seaborn as sns
# %matplotlib inline
results = results.fillna(0)
results.set_index('Gene ID', inplace=True)
endosperm_after_pollination = after_pollination.filter(regex='endosperm')
endosperm_after_pollination.columns = endosperm_after_pollination.columns.str.replace("days after pollination", "")
endosperm_after_pollination.columns = endosperm_after_pollination.columns.str.replace("endosperm, ", "")
seed_after_pollination = after_pollination.filter(regex='seed')
seed_after_pollination.columns = seed_after_pollination.columns.str.replace("days after pollination", "")
seed_after_pollination.columns = seed_after_pollination.columns.str.replace("seed, ", "")
leaf_after_pollination = after_pollination.filter(regex='leaf')
leaf_after_pollination.columns = leaf_after_pollination.columns.str.replace("days after pollination", "")
leaf_after_pollination.columns = leaf_after_pollination.columns.str.replace("leaf, ", "")
internode_after_pollination = after_pollination.filter(regex='internode')
internode_after_pollination.columns = internode_after_pollination.columns.str.replace("days after pollination", "")
internode_after_pollination.columns = internode_after_pollination.columns.str.replace("internode, ", "")
plant_embryo_after_pollination = after_pollination.filter(regex='plant embryo')
plant_embryo_after_pollination.columns = plant_embryo_after_pollination.columns.str.replace("days after pollination", "")
plant_embryo_after_pollination.columns = plant_embryo_after_pollination.columns.str.replace("plant embryo, ", "")
pericarp_after_pollination = after_pollination.filter(regex='pericarp')
pericarp_after_pollination.columns = pericarp_after_pollination.columns.str.replace("days after pollination", "")
pericarp_after_pollination.columns = pericarp_after_pollination.columns.str.replace("pericarp, ", "")
seed_after_pollination
2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5.0 | 6.0 | 5.0 | 4.0 | 4.0 | 3.0 | 3.0 | 2.0 | 2.0 | 1.0 | 2.0 | 2.0 |
1 | 42.0 | 40.0 | 30.0 | 29.0 | 28.0 | 41.0 | 31.0 | 23.0 | 23.0 | 19.0 | 25.0 | 23.0 |
2 | NaN | 0.2 | 0.2 | NaN | 0.1 | NaN | 0.1 | NaN | NaN | NaN | 0.1 | 0.2 |
3 | 0.2 | 0.1 | 0.1 | 0.1 | 0.1 | NaN | NaN | NaN | 0.2 | 0.1 | NaN | NaN |
4 | 0.1 | NaN | 0.1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37838 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37839 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.2 |
37840 | 0.2 | NaN | NaN | NaN | 0.2 | NaN | 0.5 | NaN | 0.2 | 0.1 | 0.2 | 0.1 |
37842 | 2.0 | 3.0 | 3.0 | 2.0 | 2.0 | 0.8 | 1.0 | 0.3 | 0.5 | 0.4 | 0.4 | 0.4 |
37843 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
35767 rows Ă 12 columns
from sklearn.decomposition import PCA
from sklearn import preprocessing
import matplotlib.pyplot as plt
scaled_data = preprocessing.scale(results.T)
pca = PCA()
pca.fit(scaled_data)
pca_data = pca.transform(scaled_data)
per_var = np.round(pca.explained_variance_ratio_ * 100, decimals=1)
labels = ['PC' + str(x) for x in range(1, len(per_var)+1)]
plt.bar(x=range(1,len(per_var)+1), height=per_var, tick_label=labels)
plt.ylabel('Percentage of Explained Variance')
plt.xlabel('Principal Component')
plt.xticks(rotation=90)
plt.title('Scree Plot')
plt.show()
Ergebnis: Die ersten 2 Hauptkomponenten bilden nahezu ausschlieĂlich die Varianzen ab
pca_df = pd.DataFrame(pca_data, columns=labels, index=results.columns)
plt.rcParams["figure.figsize"] = (15,15)
plt.scatter(pca_df.PC1, pca_df.PC2)
plt.title('PCA Graph')
plt.xlabel(f'PC1- {per_var[0]}%')
plt.ylabel(f'PC2- {per_var[1]}%')
pca_df
for sample in pca_df.index:
plt.annotate(sample.partition(',')[0], (pca_df.PC1.loc[sample], pca_df.PC2.loc[sample]), rotation=45)
plt.show()
loading_scores = pd.Series(pca.components_[0], index=results.index)
sorted_loading_scores = loading_scores.abs().sort_values(ascending=False)
top_10_genes = sorted_loading_scores[0:10].index.values
loading_scores[top_10_genes]
Gene ID
Zm00001eb059170 -0.012067
Zm00001eb246940 -0.012066
Zm00001eb216140 -0.012039
Zm00001eb077390 -0.012038
Zm00001eb423880 -0.012003
Zm00001eb301640 -0.011932
Zm00001eb077380 -0.011899
Zm00001eb232720 -0.011895
Zm00001eb395490 -0.011878
Zm00001eb285560 -0.011875
dtype: float64
df = results.loc[["Zm00001eb059170", "Zm00001eb246940", "Zm00001eb216140"]]
df
root, 3 days after sowing | differentiation zone of primary root, 3 days after sowing | meristematic zone and elongation zone, 3 days after sowing | stele, 3 days after sowing | cortical parenchyma of root, 3 days after sowing | coleoptile, 6 days after sowing | primary root, 6 days after sowing | primary root, 7 days after sowing | root, 7 days after sowing | seminal root, 7 days after sowing | ... | seed, 24 days after pollination | plant embryo, 24 days after pollination | endosperm, 24 days after pollination | leaf, 30 days after pollination | internode, 30 days after pollination | thirteenth leaf, whole plant fruit formation stage 30 to 50% | thirteenth leaf, whole plant flowering stage | pre-pollination cob, whole plant flowering stage | anthers, whole plant flowering stage | silks, whole plant flowering stage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gene ID | |||||||||||||||||||||
Zm00001eb059170 | 10.0 | 10.0 | 15.0 | 11.0 | 9.0 | 13.0 | 8.0 | 11.0 | 11.0 | 14.0 | ... | 10.0 | 20.0 | 7.0 | 4.0 | 11.0 | 2.0 | 1.0 | 18.0 | 4.0 | 11.0 |
Zm00001eb246940 | 23.0 | 19.0 | 32.0 | 30.0 | 10.0 | 31.0 | 22.0 | 22.0 | 23.0 | 23.0 | ... | 30.0 | 50.0 | 26.0 | 9.0 | 24.0 | 5.0 | 6.0 | 48.0 | 10.0 | 35.0 |
Zm00001eb216140 | 6.0 | 3.0 | 7.0 | 5.0 | 3.0 | 9.0 | 6.0 | 5.0 | 5.0 | 5.0 | ... | 4.0 | 8.0 | 3.0 | 1.0 | 4.0 | 0.8 | 0.8 | 9.0 | 2.0 | 4.0 |
3 rows Ă 92 columns
Die ersten 2 Hauptkomponenten bilden nahezu ausschlieĂlich die Varianzen ab (11-16% jeweils!) Alle anderen Hauptkomponenten zeigen Varianzen von unter 10%.
Die PCA zeigt deutliche Cluster der unterschiedlichen Pflanzenteile. Die Samenproben korrelieren deutlich miteinander. Auch die Endosperm- und Blatt-Proben bilden deutliche Cluster. Samen- und Endospermproben-Cluster liegen nahe bei einander, was suggeriert, dass diese Pflanzenteile sich Ă€hnlich in ihrer Genexpression verhalten. Diese Cluster unterscheiden sich deutlich von den Blattproben. Der groĂe Clusterabstand deutet an, dass sich die Genexpression in den BlĂ€ttern deutlich anders verhĂ€lt, als bei den Samen. Generell trennen sich die Blattcluster von allen anderen Probenorten deutlich ab.
Nach Untersuchung welche Gene die Cluster am stÀrksten trennen kamen folgende Ergebnisse heraus:
- Zm00001eb059170
- Zm00001eb246940
- Zm00001eb216140
- Zm00001eb077390
- Zm00001eb423880
- Zm00001eb301640
- Zm00001eb077380
- Zm00001eb232720
- Zm00001eb395490
- Zm00001eb285560
Diese 10 Gene bilden den gröĂten Expressionsunterschied laut Hauptkomponentenanalyse. Sie unterscheiden sich damit am stĂ€rksten in den BlĂ€ttern verglichen mit allen anderen Pflanzenteilen. Die Selektion dieser Spalten bestĂ€tigt die PCA zusĂ€tzlich. Die genannten Gene werden deutlich weniger innerhalb der BlĂ€tter als in allen anderen Pflanzenteilen exprimiert.