This is a project investigating the possibility of building a text to image model without using any labeled training data, relying only on CLIP and unlabeled images. The best description of the project is here. As of this point, there's little reason for anybody but me to run any of this code.