Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added prompting structure #15

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions AITutor_Backend/src/DataUtils/wiki_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import requests
import mwparserfromhell
import wikipedia
import numpy as np

def get_wikidata_from_title(title: str,):
try:
content = wikipedia.page(title)
parsed_wikidata = {
"title": content.title,
"content": content.content,
"images": content.images,
"links": content.links
}
return True, parsed_wikidata
except:
return False, {'error': f"Error: not a valid Wikipedia Page. Check the title and ensure you entered it correctly (title-{title})"}

print(get_wikidata_from_title("som random thing"))

def get_titles(search_query):
return wikipedia.search(search_query.encode("utf-8"), results=20)

def edit_distance(s1, s2):
n = len(s1)
m = len(s2)
#create matrix of zeroes of the lenghts of s1 and s2
distance_matrix = np.zeros((n+1,m+1))
#fill the empty string comparation with s1 and s2 so 1, 2, 3, 4... n
for col in range(1,n+1):
distance_matrix[col,0] = distance_matrix[col-1,0]+1
for row in range(1, m+1):
distance_matrix[0,row] = distance_matrix[0,row-1]+1
#iterate ove the matrix and fill it
for i in range(1,n+1):
for j in range(1,m+1):
#fill the matrix, if it is insertion or deletion add 1
insertion = distance_matrix[i,j-1] + 1
deletion = distance_matrix[i-1,j] + 1
#if both characters are different that means substitution so the cost is 2
if s1[i-1] != s2[j-1]:
replace_same = distance_matrix[i-1,j-1] + 2
#if both characters are the same nothing is added since there is not cost
else:
replace_same = distance_matrix[i-1,j-1]
#pick the min
distance_matrix[i, j] = min([insertion, deletion, replace_same])
#return the min distance
return int(distance_matrix[n][m])

def get_wikidata_from_topic(topic,):
titles = get_titles(topic)
titles.sort(key=lambda x: edit_distance(x, topic))
return get_wikidata_from_title(titles[0])
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Create a Concept Graph, i.e. Concepts and their sub Concepts, which fully covers Artificial Intelligence from a Computer Science context. Output the generic list format and then convert the output into the following plaintext format separated by new lines and tabs:

1. Concept A
- Concept B
- concept c

2. Concept D
...

```plaintext
concept a
\tconcept b
\t\tconcept c
concept d
...
```

Note: It is important to include as much academic detail as possible since the graph you create will be used to teach a student the topic.
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
## Environment Backstory and Call to Action
Take on the role of an expert and all-knowing AI Tutor. Create and outline a detailed plan in Natural Language that you can use to develop a SlidePlan data Structure Object. The output should As the Tutor, it is your responsibility to incorporate the student's learning outcomes and cover what the student wants to learn. You will need to detail all fields required by the SlidePlan, this will help you in the future. You will have access to all previous SlidePlans created and a list of all Concepts via a ConceptDatabase which have a value associated with the number of times that concept has already been explored by our current SlidePlan set.
Take on the role of an expert and all-knowing AI Tutor. Create and outline a detailed List of slide plans in Natural Language which represent a complete presentation for the topic presented to you. As the Tutor, it is your responsibility to incorporate the student's learning outcomes and cover what the student wants to learn. You will need to detail all fields required by the SlidePlan, this will help you in the future.

## Documentation
Your upcoming task will involve crafting a singular SlidePlan that will later be transformed into an actual Slide. Each SlidePlan must be meticulously planned, considering the following structure:
Your upcoming task will involve crafting a bulleted-list of SlidePlans that will later be transformed into an actual Slides. Each SlidePlan must be meticulously planned as they will be converted into a document used to teach a student. Observe the following structure.

### SlidePlan Data Structure
You will be constructing SlidePlans with the following structure:
You will be constructing a bulleted-list of SlidePlans, for each SlidePlan s must exist within the following structure:

- Title: A descriptive title for the slide.
- Purpose: ENUM (Introductory, Relative, Exploratory, Explanative, Examplative) indicating the primary role of the slide in the lesson.
- Purpose: (Introductory, Relative, Exploratory, Explanative, Examplative) indicating the primary role of the slide in the lesson.
- Purpose Statement: A brief explanation of what the slide aims to achieve or convey.
- Concepts: A list of relevant Concepts (derived from the Concept Database) that the slide will address.

Expand Down Expand Up @@ -41,10 +41,11 @@ Students learn based on Concepts. We develop learning material based on concepts
- E.g. we provide an example for communication and persuasion by using the slide to ask the user to discuss a topic and conversing with the user on the topic.

### How to Create an Optimal SlidePlan:
Consider the Concepts Provided for you in the environment. These concepts will need to be covered by your list of slideplans. The goal of the slideplans is to provide the student with enough learning material to prepare them for being tested on

**SlidePlan Set Structure:**
**SlidePlan Li Structure:**
- SlidePlans Purposes should follow some cohesive structure, such as:
Consider SlidePlan Set made for some concepts c_i, c_j, c_k;
Consider SlidePlan Set made for some concepts c_i, c_j, c_k where (c_i, c_j, c_k) \exist in ConceptDatabase;
SlidePlan 1: We introduce c_i
SlidePlan 2: we explain c_i
SlidePlan 3: we example c_i
Expand All @@ -53,42 +54,26 @@ SlidePlan 5: We explore (c_j, c_k),
SlidePlan 6: we explain (c_j, c_k),
SlidePlan 7: we example (c_j, c_k)
SlidePlan 8: we relate (c_i, c_k)
This is the idea, as it would not make sense to relate c_i and c_k without introducing/exploring them first.

**Current SlidePlans Analysis:**
Review the existing SlidePlans, examining their titles, purposes, purpose statements, and associated concepts.

**Concept Exploration Mapping:**
Analyze the Concept Database to understand the learning outcomes of the student; this will help you focus on what concepts to discuss.

**Strategic Planning:**
Focus on discovering unused concepts, emphasizing complex ones, and ensuring a logical progression in SlidePlans. For instance, after introducing concept 'A', explore its applications, relate it to concept 'B', and then delve into concept 'B's intricacies.
...

**Description of a Good SlidePlan Plan**:
Craft in natural language a SlidePlan Plan based on the current educational state, ensuring alignment with the student's learning journey and the Concept Database's scope. Your output should be a detailed plan in plain text, capturing the essence of the Slide's purpose, Title Content, and Conceptual Relevance based on the ConceptDatabase.

## Assessing the Environment
- **Current SlidePlan Set:**
Reflect on existing SlidePlans to avoid redundancy and ensure comprehensive coverage of topics.

- **Concept Database and Exploratory Values:** Use the Concept Database to identify key concepts that need to be introduced or further explored. Also, use the Exploratory Values to determine which concepts need to be explored further.

- **Notebank:** This Notebank is a plan you have previously developed to help you with this process. Use it to assess what the student wants to learn and/or focus on in the lesson. Whatever plan you have created already, you should aim to stick by it. The student's Slide Preference Statement is important to pay attention to.

- **Purpose of Slides:** Determine the most appropriate purpose for each SlidePlan based on the student's current understanding and the flow of information.
Rules for the AI Tutor

## Environment
- **Current SlidePlan Set**:
<SlidePlanSet>
$ENV.SLIDE_PLAN_SET$
</SlidePlanSet>

- **Concept Database and Exploratory Values:**
- **Concept Database:**
<ConceptDatabase>
Concept | Exploratory Value
-----------------------------------------
$ENV.CONCEPTS_EXPLORED_VALUES$
$ENV.CONCEPTS$
</ConceptDatabase>

- **Notebank**:
Expand All @@ -99,8 +84,10 @@ $ENV.NOTEBANK_STATE$
## Rules
- Ensure that each SlidePlan is unique and contributes to the overall learning experience.
- Select concepts that are relevant and appropriate (and exist in the ConceptDatabase) for the slide's purpose.
Create a SlidePlan that logically progresses from previous content, building upon established knowledge.
- Focus on concept-based learning to enhance the student's comprehension and retention.
- Create each SlidePlan in a way that logically progresses from previous, building upon established knowledge.
- Focus on concept-based learning to enhance the student's comprehension and retention. Ensure all concepts defined exist in the Concept Database.
- Determine the most appropriate purpose for each SlidePlan based on the student's current understanding and the flow of information.


## Your Task
In the Slide Planning Phase, your role is to create a single detailed SlidePlan that facilitates effective learning. Utilize the provided information about the ConceptDatabase and existing SlidePlans to guide your planning. Make sure your plan is comprehensive, aligns with the educational goals, and adheres to the structural requirements of the SlidePlan.
Utilize the provided information about the ConceptDatabase to make a comprehensive and intuitive Slide Plan List which aligns with the educational goals and adheres to the structural requirements of the SlidePlan.
41 changes: 39 additions & 2 deletions AITutor_Backend/src/TutorUtils/concepts.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,43 @@
DEBUG = bool(os.environ.get("DEBUG", 0))

class ConceptDatabase(SQLSerializable):
@staticmethod
def build_dict(lines, current_level=0, index=0):
"""
Recursively build a dictionary structure from a list of lines.
"""
def parse_line(l):
"""
Parse a line to determine its concept and indentation level.
Indentation level is determined by the number of leading spaces.
"""
indentation = len(l) - len(l.lstrip('\t'))
concept = l.strip()
return concept, indentation

if index >= len(lines):
return [], index

tree = []
while index < len(lines):
concept, level = parse_line(lines[index])
if level > current_level:
# If the next level is deeper, recursively build its structure
children, index = ConceptDatabase.build_dict(lines, level, index)
if tree and 'refs' in tree[-1]:
tree[-1]['refs'].extend(children)
elif tree:
tree[-1]['refs'] = children
elif level < current_level:
# If the next level is higher, return to the previous level
return tree, index
else:
# Same level, add a new node and continue
tree.append({'concept': concept, 'refs': []})
index += 1

return tree, index

class ConceptLLMAPI:
CURR_ENV_MAIN_CONCEPT_DELIMITER = "$CURR_ENV.MAIN_CONCEPT$" # TODO: Add tutor plan string to the llm request
CURR_ENV_CONCEPT_LIST_DELIMITER = "$CURR_ENV.CONCEPT_LIST$"
Expand Down Expand Up @@ -71,7 +108,7 @@ def request_concept_data_from_llm(self, env_main_concept, env_concept_list, conc

__CONCEPT_REGEX = re.compile(r'\`\`\`yaml([^\`]*)\`\`\`') # Matches any ```yaml ... ```
def __init__(self, main_concept:str, tutor_plan:str = "", generation=True, max_threads=120): #TODO: Fix potential Resource lock issue
self.concept_llm_api = self.ConceptLLMAPI("AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/concept_prompt", tutor_plan=tutor_plan) # TODO: FIX
self.concept_llm_api = self.ConceptLLMAPI("AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Concepts/concept_prompt", tutor_plan=tutor_plan) # TODO: FIX
self.main_concept = main_concept
self.Concepts = []
if generation:
Expand Down Expand Up @@ -255,4 +292,4 @@ def to_sql(self,) -> Tuple[str, str, str]:
Tuple[str, str, str]: (concept_name, concept_def, concept_latex)
"""
return (self.name, self.to_tokenized_def(), self.latex,)

2 changes: 1 addition & 1 deletion AITutor_Backend/src/TutorUtils/questions.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def __init__(self, num_questions, Notebank, ConceptDatabase):
assert isinstance(num_questions, int), "Cannot Create a QuestionSuite without specifying (int) number of questions. Check the Data Type provided for num_questions"
self.num_questions = max(min(25, num_questions), 1)
self.Questions = []
self.llm_api = QuestionSuite.QuestionLLMAPI("AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/plan_question_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/plan_to_question_prompt", )
self.llm_api = QuestionSuite.QuestionLLMAPI("AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Questions/plan_question_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Questions/plan_to_question_prompt", )

def generate_question_data(self, ):
if DEBUG: print (f"Generating Question Data for {self.__ConceptDatabase.main_concept}")
Expand Down
2 changes: 1 addition & 1 deletion AITutor_Backend/src/TutorUtils/slides.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,7 +373,7 @@ def __init__(self, Notebank, ConceptDatabase):
self.ConceptDatabase = ConceptDatabase
self.num_slides = 0
self.current_obj_idx = 0
self.llm_api = SlidePlanner.SlideLLMAPI("AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/slideplan_plan_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/slideplan_to_obj_prompt","AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/slide_plan_termination_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/slide_description_prompt") # LLM API for generating slide plans
self.llm_api = SlidePlanner.SlideLLMAPI("AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Slides/slideplan_plan_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Slides/slideplan_to_obj_prompt","AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Slides/slide_plan_termination_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Slides/slide_description_prompt") # LLM API for generating slide plans
def to_sql(self):
return (self.current_obj_idx, self.num_slides, [slide.to_sql() for slide in self.Slides])

Expand Down
2 changes: 1 addition & 1 deletion AITutor_Backend/src/tutor_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ def __init__(self,):
self.slide_planner = None
self.question_suite = None
self._content_generated = False
self.executor = TutorEnv.Executor(self, "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/main_concept_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/concept_list_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/notebank_filter_prompt")
self.executor = TutorEnv.Executor(self, "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Concepts//main_concept_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/Concepts//concept_list_prompt", "AITutor_Backend/src/TutorUtils/Prompts/KnowledgePhase/notebank_filter_prompt")

def step(self, input_data):
return self.executor.process_action(input_data), self.current_state
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ torch
sympy
python-pdf
python-pptx
pypdf2
pypdf2
mwparserfromhell
wikipedia
Loading