Glossary
Technical Terms
- CI/CD
Continuous Integration / Continuous Deployment. DevOps practice for automating code testing and deployment.
- Coverage
Measure of code percentage tested by unit tests. Project objective: ≥90%.
- DuckDB
High-performance columnar OLAP (Online Analytical Processing) database. SQLite alternative for analytics.
- DNAT
Destination Network Address Translation. iptables technique to redirect network traffic directly to port 3910, bypassing reverse proxy for 10x performance gain.
- Flake8
Python tool verifying PEP8 compliance (standard Python code style).
- Parquet
Compressed columnar file format optimized for analytics. 5-10x more performant than CSV.
- Polars
High-performance Python DataFrame library, Pandas alternative. Rust engine, Arrow format.
- Pytest
Most popular Python unit testing framework.
- S3 Garage
Self-hosted Amazon S3 protocol implementation. Used for project dataset storage.
- Streamlit
Python framework for creating interactive data science web apps without JavaScript.
- TTL
Time To Live. Cache duration before expiration (project: 3600s = 1h).
- uv
Modern ultra-fast Python package manager. pip replacement, 10-100x faster.
Project Terms
- PREPROD
Pre-production environment for testing before PROD deployment. URL: https://mangetamain.lafrance.io/, port 8500.
- PRODUCTION
Stable production environment. URL: https://backtothefuturekitchen.lafrance.io/, port 8501.
- Recipes Clean
Cleaned dataset of 178,265 Food.com recipes (1999-2018). File: recipes_clean.parquet (250 MB).
- Ratings Longterm
Dataset of 1.1M+ user interactions. File: ratings_longterm.parquet (180 MB).
- Back to the Kitchen
Project name and visual identity (orange/black palette).
- Runner Self-Hosted
GitHub Actions machine running CI/CD directly on dataia VM (avoids VPN).
Analytics Terms
- Trends Analysis
Module
analyse_trendlines_v2studying recipe evolution 1999-2018 (volume, duration, complexity).- Seasonal Analysis
Module
analyse_seasonalityidentifying seasonal patterns (winter/summer, December peaks).- Weekend Analysis
Module
analyse_weekendcomparing weekday vs weekend publications (Monday +45%, Saturday -49%).- Ratings Analysis
Module
analyse_ratingsstudying user rating distribution (78% positive bias = 5★).- Complexity Score
0-10 score calculated from number of steps + ingredients + preparation time.
- Visual Identity
Unified color palette:
ORANGE_PRIMARY(#FF8C00),BACKGROUND_MAIN(#1E1E1E).
Infrastructure Terms
- Dataia
VM hosting PREPROD/PROD services, Garage S3, GitHub Actions runner. IP: 192.168.80.202.
- Docker Compose
Docker container orchestration tool. Files: docker-compose.yml (PREPROD), docker-compose-prod.yml (PROD).
- Health Check
HTTP endpoint verifying application health. URL:
/_stcore/health. Retry 3 times, timeout 10s.- Loguru
Modern Python logging library. Config: 2 files (debug.log, errors.log), automatic rotation.
- Discord Webhook
Real-time CI/CD notifications in #deployments channel.
Acronyms
- API
Application Programming Interface. Documented Python module reference.
- ASCII
American Standard Code for Information Interchange. Text diagrams used in documentation.
- CSV
Comma-Separated Values. Tabular file format (replaced by Parquet for performance).
- EDA
Exploratory Data Analysis. Jupyter notebooks for data exploration (
00_eda/).- FAQ
Frequently Asked Questions. Common questions page.
- JSON
JavaScript Object Notation. Data exchange format.
- OLAP
Online Analytical Processing. Database type optimized for analytics (DuckDB).
- PEP8
Python Enhancement Proposal 8. Standard Python code style guide.
- PR
Pull Request. GitHub code merge request.
- REST
Representational State Transfer. API architecture (not used in project, web app).
- RST
reStructuredText. Sphinx documentation markup format.
- SQL
Structured Query Language. Database query language.
- SSH
Secure Shell. Protocol for secure dataia VM connection.
- TTL
Time To Live. Cache duration.
- UI
User Interface. Streamlit user interface.
- URL
Uniform Resource Locator. Web address.
- VPN
Virtual Private Network. Virtual private network (dataia accessible via VPN).
Common Commands
uv syncInstalls all project dependencies from pyproject.toml.
uv run streamlit runLaunches Streamlit application in development mode.
pytestRuns unit tests.
flake8Verifies PEP8 compliance.
blackAutomatically formats Python code (opinionated style).
docker-compose up -dStarts Docker containers in background.
git push origin mainPushes commits to main branch, triggers CI/CD.
gh run watchMonitors GitHub Actions workflow execution in real time.
ssh dataiaSSH connection to dataia VM.
aws s3 cp --profile s3fastCopies files from/to S3 Garage.
Key Values
Quality Objectives
Coverage: ≥90% (achieved: 93%)
Tests: 118 tests (83 unit + 35 infrastructure)
PEP8: 100% compliance
Docstrings: Google Style, 100% functions
Performance Metrics
S3 without DNAT: 50-100 MB/s
S3 with DNAT: 500-917 MB/s (10x)
Streamlit cache: <0.1s (after first load 5-10s)
CI build: ~2-3 minutes
CD PREPROD: ~40 seconds
CD PROD: ~1 minute
Project Data
Recipes: 178,265 recipes
Ratings: 1,132,367 interactions
Users: 25,076 contributors
Period: 1999-2018 (20 years)
Tags: ~500 unique tags
Storage: ~450 MB compressed Parquet
Configurations
Python: 3.13.7
Streamlit: 1.50.0
Plotly: 5.24.1
DuckDB: 1.4.0
Polars: 1.19.0
Cache TTL: 3600s (1h)
See Also
Quick Start Guide - Quick start guide
FAQ - Frequently Asked Questions - Frequently asked questions
Technical Architecture - Detailed technical architecture
api/index - Complete API reference