An intelligent RSS news aggregator that scores and clusters headlines using advanced NLP techniques and authority-based ranking.
- Smart Feed Aggregation: Automatically fetches and processes articles from multiple RSS feeds
- Impact Scoring: Implements sophisticated source credibility scoring
- Advanced Clustering: Groups similar headlines using state-of-the-art NLP
- Flexible Output: Supports multiple output destinations (Google Forms, Email, Slack, Cloud Services)
- Temporal Filtering: Configurable timeframe for article inclusion
- Source Classification: Multi-tier authority classification system
- Entity Recognition: Advanced named entity extraction from headlines
- Python 3.9 or higher
- pip package manager
- Virtual environment (recommended)
# Clone the repository
git clone https://github.com/CartesianXR7/Meridian-Insights.git
cd Meridian-Insights
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install NLP models
python -m spacy download en_core_web_sm
python -m nltk.downloader vader_lexicon stopwords
TIME_DELTA_HOURS
: Number of hours to look back (default: 72)TRANSFORMERS_CACHE
: Cache directory for transformer models
The system uses four authority levels for sources:
- High Impact (5 points)
- Medium-High Impact (3 points)
- Medium Impact (2 points)
- Medium-Low Impact (1 point)
from meridian import MeridianAggregator
# Initialize aggregator
aggregator = MeridianAggregator()
# Run aggregation
results = aggregator.run()
# Configure for Google Forms output
aggregator.configure_output(
output_type="google_forms",
form_id="your-form-id"
)
# Or configure for multiple outputs
aggregator.configure_output([
{"type": "google_forms", "form_id": "your-form-id"},
{"type": "slack", "webhook_url": "your-webhook-url"}
])
To add new RSS feeds, modify the rss_feeds
list in the configuration:
rss_feeds = [
"https://example.com/feed",
"https://another-source.com/rss"
]
Modify the IMPACT_DOMAINS
dictionary to adjust source credibility scores.
We welcome contributions! Please see our Contributing Guidelines for details.
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
flake8
black .
- Uses DBSCAN clustering
- Sentence embeddings via SentenceTransformers
- Configurable similarity thresholds
- Text preprocessing
- Named entity recognition
- Sentiment analysis
- Semantic similarity computation
- Async RSS feed fetching
- Optimized clustering for large datasets
- Configurable caching for embeddings
- No sensitive credentials in source code
- Safe handling of external connections
- Input sanitization for all data sources
This project is licensed under the MIT License - see the LICENSE file for details.
- Create an Issue for bug reports
- Start a Discussion for questions
- Email: Stephen@wavebound.io
- All the open-source projects that made this possible
- Contributors and maintainers
- The NLP and RSS communities