ScreenSage is an AI test cases generator. Simply by uploading screenshots of your new app or software, it will generate a series of elaborate test cases for each functionality. It saves you the effort of manually describing test cases and prevents the risk of leaving any application component untested. Link to demo video (https://youtu.be/4AButkbHXUg)
In the development of ScreenSage, chosing the right kind of Multimodal architecture was of extreme importance, including selecting the appropiate Language Models. ScreenSage consists of two LLMs: LLaVA 1.5 7B Vision Language Model and any of these Large Language Models:
Gemma2-9b-it, Gemma-7b-it, LLaMA3-70b-8192, LLaMA3-8b-8192, Mixtral-8x7b-Instruct-v0.1
This is how the architecture or the flow of ScreenSage looks like:
The Image + Text is fed into the LlaVA Model which generates the image description and that is fed an input to the Language Models to generate test cases. Image to image descriptions and image descriptions to test cases.
-
Streamlit: For hosting ScreenSage and API management.
-
Groq: It is an AI console that gives access to all the cutting-edge models. It is simple, free of cost and yet provides quality soution (API response speed is very fast)
-
base64: A python library used for encoding Image bytes to printable characters.
- Uploading Multipal Images: ScreenSage can process multiple images and will generate test cases that each of the images represent.
- Chat History: ScreenSage maintains a chat history for each session so that you go back and refer incase if you missed out on any test case.
- Context: You can additionally provide context for generating your test cases.
- Model Selection: ScreenSage provides a range of model selection to choose from with the latest cutting edge models present.
- Token Size Selection: ScreenSage provides a functionality of changing the size of the token for each model. This is to determine how elaborate you want your test cases to be.
- Comprehensive Test Cases: ScreenSage provides coherent and easy to understand test cases and also attempts to cover all the bases of a given functionality.
There are three prompts given to the multimodal. Two of the prompts are static (fixed prompts everytime the model is run) and one prompt added in realtime.
The prompt given to the LlaVA Vision Model is (prompt1):
"Describe contents of this screenshot from an app User Interface."
This is so that the vision model has a superficial understanding of the contents of the images. The second prompt is given to the Large Language Model for test case generation and that is (prompt2):
"You are a helpful assistant that describes test cases for any app features, based on the descriptions of the screenshots of the app.
Each test case should include:
+ Test Case ID - Assign a unique identifier to the test case.
+ Description - Describe the test case, outlining what it is designed to do.
+ Pre-conditions - Document any pre-conditions that need to be in place for the test case to run properly. It may include initial configuration settings or manually executing some previous tests.
+ Testing Steps - Document the detailed steps necessary to execute the test case. This includes deciding which actions should be taken to perform the test.
+ Expected Result - Provide the expected result of the test. This is the result the tester is looking to verify."
The third prompt (prompt3) is nothing but the context that you provide and that is also given to the LlaVa model. Prompt3 is just appended to prompt1 in real time as and when you run the solution
- Home Screen
- Selecting Image Files
- Describe Testing Intstructions
Clone the project
git clone https://github.com/Progpr/myracle-case-study
Go to the project directory
cd myracle-case-study
Install dependencies
pip install -r requirements.txt
Run the Streamlit App
streamlit run app3.py