Function Representation repository contain 92 functions, on whom we have trained our curious learner model.
Curious learner is a generative model which will aims to figure out proper function based on task that is asked using natural language. It not only find the proper function but also try to figure out the parameters of the function from the give natural language prompt.
In this repository we have gather the following 98 mathematical function, which is used for training the curious learner model.
- addition(x: int, y: int)
- subtraction(x: int, y: int)
- multiplication(x: float, y: float)
- division(x: float, y: float)
- exponentiation(x: float, y: float)
- square_root(x: float)
- floor_division(x: int, y: int)
- modulus(x: int, y: int)
- logarithm(x: float, base: float)
- sine(x: float)
- cosine(x: float)
- tangent(x: float)
- arcsine(x: float)
- arccosine(x: float)
- arctangent(x: float)
- hyperbolic_sine(x: float)
- hyperbolic_cosine(x: float)
- hyperbolic_tangent(x: float)
- logarithm_base_10(x: float)
- logarithm_base_2(x: float)
- degrees_to_radians(x: float)
- radians_to_degrees(x: float)
- gcd(x: int, y: int)
- lcm(x: int, y: int)
- isqrt(x: int)
- pow_mod(x: int, y: int, mod: int)
- ceil(x: float)
- floor(x: float)
- round(x: float)
- absolute_difference(x: float, y: float)
- greatest_value(x: float, y: float)
- smallest_value(x: float, y: float)
- product(numbers: list)
- factorial(x: int)
- is_prime(x: int)
- prime_factors(x: int)
- is_perfect_square(x: int)
- is_perfect_cube(x: int)
- mean(numbers: list)
- median(numbers: list)
- relu(x: float)
- ascending_sort(lst: list[int])
- descending_sort(lst: list[int])
- square_int(x: int)
- square(x: float)
- absolute(x: float)
- power_of_ten(x: float)
- cube(x: float)
- cube_root(x: float)
- is_even(x: int)
- is_odd(x: int)
- max_value(lst: list[int])
- min_value(lst: list[int])
- nth_root(x: float, n: int)
- geometric_mean(lst: list[float])
- is_power_of_two(x: int)
- binary_to_decimal(binary)
- decimal_to_binary(decimal)
- is_palindrome(x: str)
- sum_of_digits(x: int)
- hypotenuse(a: float, b: float)
- circle_area(radius: float)
- permutation(n: int, r: int)
- combination(n: int, r: int)
- invert_number(number: float)
- float_to_int(value: float)
- int_to_float(value: int)
- geometric_series_sum(a: float, r: float, n: int)
- sigmoid(x: float)
- cosine_similarity(vector1: list, vector2: list)
- euler_totient(n: int)
- l1_norm(vector: list)
- l2_norm(vector: list)
- average(numbers: list)
- sum(numbers: list)
- length(numbers: list)
- check_same_string(str1: str, str2: str)
- reverse_string(input_str: str)
- get_pi()
- get_e()
- calculate_dot_product(vector1: list, vector2: list)
- a_plus_b_whole_square(a: int, b: int)
- a_squared_plus_2ab_plus_b_squared(a: int, b: int)
- a_minus_b_whole_squared_plus_4ab(a: int, b: int)
- a_minus_b_whole_squared(a: int, b: int)
- a_squared_minus_2ab_plus_b_squared(a: int, b: int)
- a_plus_b_whole_squared_minus_4ab(a: int, b: int)
- a_squared_plus_b_squared(a: int, b: int)
- negative_2ab(a: int, b: int)
- positive_2ab(a: int, b: int)
- x_plus_a_times_x_plus_b(x: int, a: int, b: int)
- x_squared_plus_a_plus_b_times_x_plus_ab(x: int, a: int, b: int)
- a_cubed_plus_b_cubed(a: int, b: int)
- a_plus_b_whole_cubed_minus_3ab_times_a_plus_b(a: int, b: int)
- a_plus_b_times_a_squared_minus_ab_plus_b_squared(a: int, b: int)
- a_cubed_minus_b_cubed(a: int, b: int)
- a_minus_b_whole_cubed_plus_3ab_times_a_minus_b(a: int, b: int)
- a_minus_b_times_a_squared_plus_ab_plus_b_squared(a: int, b: int)
Briefly describing the purpose of each major folder in your project.
src
contains the source code of your project.math_functions.py
is the class which contain all the 98 functions.functions_manager.py
is a Python file which hold all utility method related to function string manipulation.code_embeddings.py
is this file we used"microsoft/graphcodebert-base"
model for converting the function string into function embedding. graphcodebert-base Papersentence_embedding.py
here we have used"all-mpnet-base-v2"
model for converting sentence into embeddings.
README.md
is the main documentation file.
Please ignore the siamese.ipynb, siamese_network.py, sn_dataloader.py, sn_dataset.py and other notebooks Initially we tried to use a siamese network for creating a relationship between function embeddings, but that need further exploration.
This repository is licensed under the GNU Affero General Public License - see the LICENSE.md file for details.
We have 7 different type of embeddings, let’s describe them in brief one by one.
-
Initial Word Embedding(IWE): We will not use our own tokenizer, Rather we will use
“all-mpnet-base-v2”
pre-trained model for the words provided by input/output parser. IWE is a [1*768] tensor for each word. -
Function Token embeddings(FTE): We will not use our own tokenizer here as well, Rather we will use
“microsoft/graphcodebert-base”
pre-trained model for getting the Function token embeddings(FTE). -
ALiBiBi Embedding(PE): Attention with linear bidirectional bias, is a way to embed positional information in the attention layer, it helps in making the attention span longer, which meaning longer decoding length. Paper: Train Short, Test Long
-
Categorical Embedding(CE): Category embedding will inject token type information inside main embedding. It encodes category type, subtype and sub-subtype inside an embedding.
-
Task Embedding(TE): It encodes the type of task in a embedding.
-
Frequency Embedding(FE): This is the fourier transformation of the summation of Categorical embedding and Task embedding. This will be passed along as a linked information(Tuple) with the main embedding.
-
Combined Embedding: Combining the Token embedding with Categorical and task embeddings results in combined embedding.