Text-Processing-Tool

Overview

This repository contains a Python-based tool for text preprocessing and categorization, specifically designed to process and clean text files from a sentiment analysis dataset. The program uses NLTK and regular expressions to clean, tokenize, and remove stopwords from the text. The processed data is then saved into categorized folders for further analysis.

Features

Reads and processes text files from input directories.
Cleans text by removing punctuation, converting to lowercase, and eliminating stopwords.
Categorizes processed text into positive and negative sentiment folders.
Supports batch processing of text files within specified directories.

File Structure

A:.
├───NEGATIVO
├───POSITIVO
└───review_polarity
    └───txt_sentoken
        ├───neg
        └───pos

Requirements

Python 3.12 or higher
Libraries: nltk, re

Installation

Clone this repository:

git clone https://github.com/KPlanisphere/text-processing-tool.git

Install the required dependencies:
```
pip install nltk
```
Download the NLTK stopwords corpus:
```
import nltk
nltk.download('stopwords')
```

Usage

Define the input and output directories for positive and negative text files:

input_dir1 = r'path_to_positive_files'
input_dir2 = r'path_to_negative_files'

output_dir1 = r'path_to_output_positive_files'
output_dir2 = r'path_to_output_negative_files'

Run the script:
```
python lab9.py
```

Example

Input Directory: review_polarity/txt_sentoken/
- Positive: pos
- Negative: neg
Output Directory:
- Processed positive files: POSITIVO/
- Processed negative files: NEGATIVO/

How It Works

File Loading: The script reads text files from the specified directories.
Text Processing:
- Removes punctuation.
- Converts all text to lowercase.
- Filters out English stopwords.
Output: The cleaned text is saved into respective directories (e.g., POSITIVO/NEGATIVO).

Scripts

lab9.py

The main script that:

Processes files from the sentiment dataset.
Categorizes and saves cleaned text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Processing-Tool

Overview

Features

File Structure

Requirements

Installation

Usage

Example

How It Works

Scripts

lab9.py

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
NEGATIVO		NEGATIVO
POSITIVO		POSITIVO
review_polarity		review_polarity
README.md		README.md
lab9.py		lab9.py

KPlanisphere/text-processing-tool

Folders and files

Latest commit

History

Repository files navigation

Text-Processing-Tool

Overview

Features

File Structure

Requirements

Installation

Usage

Example

How It Works

Scripts

lab9.py

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages