- Using Machine Learning/Visualizing Light Curves with given data to determine which model is best suited to find out the existence of exoplanets. Datasets were gathered from NASA's Exoplanet Archive. Wrote queries following the structure compatiable with the Exoplanet Archive's API
- Handled and tested different graphs and models through Machine Learning in Google Colaboratory
- Extracted data sets from the NASA Exoplanet Archive to manipulate data with Pandas in Python
- Implemented Synthetic Minority Over-sampling Technique to augment found data as a means to develop a Logisitc Regression Model: Tested distinct classification models to yield the best result of 99.97% accuracy
Regular dips at consistent intervals (Green)
![Screenshot 2023-09-28 at 12 44 56 AM](https://private-user-images.githubusercontent.com/35633980/271176784-113633c7-230b-4bde-8dfc-963372a2b045.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NTM3NDksIm5iZiI6MTczOTQ1MzQ0OSwicGF0aCI6Ii8zNTYzMzk4MC8yNzExNzY3ODQtMTEzNjMzYzctMjMwYi00YmRlLThkZmMtOTYzMzcyYTJiMDQ1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDEzMzA0OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTkwZTVlZTMwNjIwY2U1YTYyODdiNGY0MzBiNDlkMjFkZTY4ZGU4ZDNiZGMxMjg1YjhlOTBmMzQ3ZTQ3NGMzOGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UcH2qA4AImbhk0W667igo4MY2BvgSa400XrhAiaf-_I)
Mostly fairly constant flux measures (Orange)
![Screenshot 2023-09-28 at 12 45 14 AM](https://private-user-images.githubusercontent.com/35633980/271176796-e090b101-8760-463e-8a23-6f94587dbe63.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NTM3NDksIm5iZiI6MTczOTQ1MzQ0OSwicGF0aCI6Ii8zNTYzMzk4MC8yNzExNzY3OTYtZTA5MGIxMDEtODc2MC00NjNlLThhMjMtNmY5NDU4N2RiZTYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDEzMzA0OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTkyOWQ4YjhlMGFkZTFkMDdkYzIyMTdmODkyNmNjZDI1M2JlZDQ1YzBmNjkwZjhiZDA3Njc4OGVmNWY5MzlhZmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.iQIdireYBqrUB7Tzj11qxZ2FVgrqD5Edn82N0r_EKvM)
KNN Algorithm
- Relies on Clustering to Separate exoplanet and non-exoplanet data
- Accuracy: ~99.31% (train) , ~99.12% (test)
Logisitc Regression
- Assigns each datapoint a probability of being an exoplanet. Decides based on this probability whether or not datapoint is an exoplanet.
- Accuracy: ~99.97% (train) , ~99.56% (test)
- SMOTE (Synthetic Minority Oversampling Technique) to augment our data: Using existing minority data (exoplanets), SMOTE creates more minority data to help balance the dataset. This ensures that the model will not have a bias towards the majority data.
![Screenshot 2023-09-28 at 12 39 14 AM](https://private-user-images.githubusercontent.com/35633980/271176309-52614996-0847-4ca3-8374-80fa7c0d3791.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NTM3NDksIm5iZiI6MTczOTQ1MzQ0OSwicGF0aCI6Ii8zNTYzMzk4MC8yNzExNzYzMDktNTI2MTQ5OTYtMDg0Ny00Y2EzLTgzNzQtODBmYTdjMGQzNzkxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDEzMzA0OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE2NTE4ZThhZTk2NDQxN2FlZDVkMmE0MTg0MjQ4NzNhODY1ZGUwNGRmNDUyMDg5NzcxMjU1NTQxNGVlZWVkMmQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.13CFE2DRq4R4Xz2gnWDlkFaIG2rLAo_U2fPrH-Pfjo4)
Complex Machine Learning Algorithms
- MLP (Multi-layer Perceptron) with augmented data [Provided an accuracy of 99.83%]
- Neural Networks (Tensorflow and Keras) [Yielded an accuracy 99.47%]
- CNN (convolutional Neural network) [Accuracy yielded is 99.67%]
NASA’s efforts currently use the transit photometry method as a reliable and logically provable way to measure the consistency of orbits and therefore the presence of an exoplanet. The use of the logistic regression model provides the easiest, most robust, and reliable way or method of accessing and conveying information relevant to transit photometry, specifically when considering factors such as ease of use, readability, and relative simplicity to code.
An additional aspect of data that would greatly help the entire process would be to use transit photometry as a means to gather more data about the exoplanet itself, once the patterns of dips have been determined and the exoplanet been confirmed. Qualities of the exoplanet like mass could be easily expanded upon, whereas more complicated details like orbiting moons, atmospheric data, and type of planet could be collected using a potentially more powerful descendant or enhancement of transit photometry, especially when considering the possibilities that could be collected using present methods.