Thesis Outline

Thesis Statement:

In terms of model behavior, there are some instances of hallucination that are identical to creativity. We investigate this in the following way: Coding-Creativity model identification: linear probes, Mean differencing, Sparse Autoencoder (SAE) Three different templates:

Factual: “Is there a python package for ” Neutral: “Give me a python package for ” Creative: “Make up a python package for ”

The idea is to identify a creative vector and a hallucination vector. Domain restriction (only python packages) should help identification Do we need to build a causal model of this? Perhaps to better conceptualize our design? Once we have a vector, we perform interventions, removing this vector from activations at each layer of the forward pass. Can we demonstrate that prompts like: “give me a python package for ” is more likely to induce hallucination than “is there a python package for ”. Essentially, the model is confused, if you’re asking for a python module, humans mean they want an existing module for x, but phrased as a command could be misunderstood to mean: invent a module for x.

Model

We use LLaMA-2 7B: it’s open source, capable, but small enough to work with.

Data

Our investigation requires a narrow dataset: python package queries. We use GPT-4o to generate prompt coding tasks. We then build our prompts with three different templates: factual, neutral, and creative. Prompts are fed into LLaMA-2 7B which generates responses. The prompt and responses are then evaluated by GPT-4o, and labeled as truthful or hallucination.

Data Directory Organization

Data are organized in the following file structure: <coding-prompt-type>/<dataset-name>/<model>

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
SAE		SAE
__pycache__		__pycache__
data		data
docker		docker
figures		figures
.gitignore		.gitignore
README.md		README.md
TODO.txt		TODO.txt
__init__.py		__init__.py
exploration.ipynb		exploration.ipynb
got_probes.py		got_probes.py
got_utils.py		got_utils.py
probes.py		probes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis Outline

Thesis Statement:

Model

Data

Data Directory Organization

About

Releases

Packages

Languages

mbowcut2/thesis

Folders and files

Latest commit

History

Repository files navigation

Thesis Outline

Thesis Statement:

Model

Data

Data Directory Organization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages