Convert Legal Documents Into Diffable Data Structures
Extract hierarchical structure from US Code titles and Public Laws in USLM XML format with full structural preservation.
Compute word-level differences between document versions to precisely track changes over time.
Parse multiple documents concurrently using Rayon for blazing-fast performance.
All data structures implement Serde traits for easy integration with any system.
Automatically identify USC references and amending actions from bills to track legislative changes.
Free and open source software. Use it in your projects, modify it, and distribute it freely.
We're actively working on Python bindings, legal-specific diff algorithms, enhanced bill parsing, pre-built datasets, and congressional vote tracking.
View Full Roadmapuse words_to_data::uslm::parser::parse; fn main() -> Result<(), Box<dyn std::error::Error>> { let document = parse("tests/test_data/usc/2025-07-18/usc07.xml", "2025-07-18")?; println!("Parsed: {}", document.data.verbose_name); println!("USLM ID: {:?}", document.data.uslm_id); println!("Children: {}", document.children.len()); Ok(()) }
use std::fs; use words_to_data::{diff::TreeDiff, uslm::parser::parse}; fn main() -> Result<(), Box<dyn std::error::Error>> { let doc_old = parse("tests/test_data/usc/2025-07-18/usc07.xml", "2025-07-18")?; let doc_new = parse("tests/test_data/usc/2025-07-30/usc07.xml", "2025-07-30")?; let diff = TreeDiff::from_elements(&doc_old, &doc_new); words_to_data::utils::write_json_file(&diff, "diff.json")?; Ok(()) }
use words_to_data::uslm::bill_parser::parse_bill_amendments; fn main() -> Result<(), Box<dyn std::error::Error>> { let data = parse_bill_amendments("tests/test_data/bills/hr-119-21.xml")?; println!("Bill {}: {} amendments found", data.bill_id, data.amendments.len()); for amendment in &data.amendments { println!("\nAmendment at: {}", amendment.source_path); println!(" USC sections modified: {}", amendment.target_paths.len()); println!(" Actions: {:?}", amendment.action_types); } Ok(()) }
use words_to_data::utils::parse_uslm_directory; fn main() -> Result<(), Box<dyn std::error::Error>> { let documents = parse_uslm_directory("tests/test_data/usc/2025-07-18", "2025-07-18")?; println!("Parsed {} documents in parallel", documents.len()); for doc in documents.iter().take(5) { println!(" - {} ({})", doc.data.verbose_name, doc.data.path); } Ok(()) }
Download the latest version and view version history on the official Rust package registry.
Have questions, feedback, or partnership opportunities? We'd love to hear from you.