README
This library is for scraping data from CSV style files, temporally, into MariaDB.
Main features are:
Uploading data from strongly-typed Polars DataFrames.
Querying data into Polars DataFrames, with column types inferred from the database schema.
A scrape specification that:
Defines pipelines for typing, enriching, and normalizing data before uploading.
Allows construction of the ‘as-of’ time from file attributes or as a function of the input columns.
Catalogs the history of scrape inputs to prevent duplication.
Supports per-file transactional scraping (either the processing for a file succeeds, or the transaction is rolled back).
Development Setup
Create a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
Install development dependencies:
poetry install --with dev
Run tests:
poetry run pytest
Make docs. The documentation will be generated in the
docs/_build/html
directory:
cd docs && poetry run make html
Code Style
This project follows the following code style guidelines:
Use type hints for all function parameters and return values
Follow PEP 8 style guide
Use Google-style docstrings
Keep functions focused and single-purpose
Write comprehensive tests for new features
Run make check
to check the code style.
Contributing
Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request
License
This project is licensed under the terms specified in the LICENSE file.