Roadmap

Upcoming features and improvements for Words to Data

Python Bindings

In Progress

Enable Python developers to leverage Words to Data's powerful parsing and diffing capabilities through native Python bindings using PyO3.

Key Features

  • Native Python API for parsing USC documents and bills
  • Pythonic interfaces for diff computation
  • Full type hints and documentation
  • PyPI distribution for easy installation

Use Cases

  • Data science and legal analytics workflows
  • Jupyter notebook integration for research
  • Web scraping and automated legal document processing

Enhanced Bill Parsing

Planned

Extract richer metadata and structured information from Public Law documents including sponsorship, voting records, and legislative intent.

Planned Enhancements

  • Extract bill sponsors and co-sponsors
  • Parse committee assignments and referrals
  • Capture legislative history and amendments
  • Identify effective dates and sunset provisions
  • Extract cross-references to regulations and case law
  • Improved detection of amending actions and intent

Benefits

  • Comprehensive legislative tracking
  • Better understanding of bill lifecycle
  • Enhanced research capabilities for legal scholars

Legal-Specific Diff System

Planned

A specialized diff algorithm designed to understand and highlight legal document changes with semantic awareness of legal structures and terminology.

Key Capabilities

  • Detection of substantive vs. formatting changes
  • Identification of definitional changes and their cascading effects
  • Recognition of cross-reference updates and their implications
  • Tracking of effective date modifications
  • Detection of renumbering and reorganization patterns

Advanced Features

  • Legal citation normalization and matching
  • Smart handling of insertions, strikes, and replacements
  • Classification of amendment types (expansion, restriction, clarification)
  • Change impact scoring and severity assessment
  • Support for conditional and contingent modifications
  • Multi-version change tracking and lineage

Benefits

  • More accurate change detection for legal documents
  • Better understanding of legislative intent
  • Reduced false positives from formatting changes
  • Enhanced compliance and regulatory tracking
  • Improved legal research and analysis workflows

Pre-built Legal Datasets

Planned

Curated, labeled datasets of parsed legal documents ready for machine learning, research, and analysis.

Dataset Categories

  • Complete USC title snapshots with temporal versions
  • Annotated bill amendments with classified action types
  • Legislative change tracking corpus (multi-year)
  • Topic-labeled legal document collections
  • Diff datasets showing legislative evolution

Applications

  • Training legal language models
  • Policy research and analysis
  • Legal analytics and trend detection
  • Academic research benchmarking

Congressional Vote & Member Tracking

Planned

Comprehensive system for tracking members of Congress, their voting records, and behavioral patterns on legislation.

Core Capabilities

  • Parse and store voting records from congress.gov
  • Link votes to specific bills and amendments
  • Track member sponsorship and co-sponsorship patterns
  • Committee membership and activity tracking
  • Historical voting record analysis
  • Party affiliation and district information

Analysis Features

  • Voting pattern clustering and similarity analysis
  • Ideology scoring based on voting behavior
  • Bill co-sponsorship network analysis
  • Legislative effectiveness metrics
  • Topic-based voting alignment tracking

Use Cases

  • Political science research
  • Constituent tracking and accountability
  • Advocacy group targeting and engagement
  • Predictive modeling of legislative outcomes

Want to Contribute?

Words to Data is open source and we welcome contributions! Whether you want to work on these roadmap items or have your own ideas, we'd love to hear from you.

View Issues

Get in Touch

Have questions, feedback, or want to work together? We'd love to hear from you.

📧
contact@wordstodata.com

For technical support, please open an issue on GitHub