initial evaluation through todo.md

This commit is contained in:
Mykola Bilokonsky 2025-06-09 12:37:04 -04:00
parent b5e1acf4e3
commit 969e6ddc10
5 changed files with 393 additions and 0 deletions

89
CLAUDE.md Normal file
View File

@ -0,0 +1,89 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. Work on this project should follow the priorities defined in [todo.md](todo.md) and the specifications in [spec.md](spec.md).
## Project Overview
Rhizome-node is a distributed, peer-to-peer database engine that implements a rhizomatic (decentralized, non-hierarchical) data model. It synchronizes data across multiple nodes without a central authority using immutable "deltas" as the fundamental unit of change. There is a specification for the behavior of this system in [spec.md](spec.md).
## Development Commands
```bash
# Build the TypeScript project
npm run build
# Build in watch mode
npm run build:watch
# Run tests
npm test
# Run a specific test file
npm test -- __tests__/delta.ts
# Run linter
npm run lint
# Generate coverage report
npm run coverage
# Run the example application
npm run example-app
```
## Architecture Overview
### Core Concepts
1. **Deltas**: Immutable change records that describe modifications to entities. Each delta contains:
- Unique ID and timestamps
- Creator and host information
- Pointers defining entity/property relationships
- DeltaV2 is the current format (DeltaV1 is legacy)
2. **Views**: Different ways to interpret the delta stream:
- **Lossless View**: Stores all deltas without conflict resolution
- **Lossy Views**: Apply conflict resolution (e.g., Last-Write-Wins)
- Custom resolvers can be implemented
3. **Collections**: Group related entities (similar to database tables)
- Support typed collections via `TypedCollection<T>`
- Implement CRUD operations through delta generation
4. **Networking**: Dual transport layer:
- ZeroMQ for efficient binary communication
- libp2p for decentralized peer discovery
- Pub/sub for delta propagation
- Request/reply for synchronization
### Key Files and Entry Points
- `src/node.ts`: Main `RhizomeNode` class orchestrating all components
- `src/delta.ts`: Delta data structures and conversion logic
- `src/lossless.ts`: Core lossless view implementation
- `src/collection-basic.ts`: Basic collection implementation
- `src/http/api.ts`: REST API endpoints
- `src/pub-sub.ts`: Network communication layer
### Testing Patterns
- Unit tests in `__tests__/` directory
- Multi-node integration tests in `__tests__/run/`
- Use Jest with experimental VM modules
- Test files follow pattern: `{feature}.ts`
### HTTP API Structure
The HTTP API provides RESTful endpoints:
- `GET/PUT /collection/:name/:id` - Entity operations
- `GET /peers` - Peer information
- `GET /deltas/stats` - Delta statistics
- `GET /lossless/:entityId` - Raw delta access
### Important Implementation Notes
- All data modifications go through deltas - never modify state directly
- Deltas are immutable once created
- Use `Context.getOrCreate()` for singleton access
- Network ports: publish (default 4000) and request (default 4001)
- Debug logging uses namespaces like `rhizome:*`

View File

@ -1,3 +1,5 @@
See [spec.md](spec.md) for additional specification details about this project.
# Concepts
| | Implemented | Notes |

View File

@ -0,0 +1,130 @@
# Spec vs Implementation Test Coverage Report
## Executive Summary
The rhizome-node implementation demonstrates strong alignment with core spec concepts but lacks implementation and testing for several advanced features. The fundamental delta → lossless → lossy transformation pipeline is well-implemented, while query systems, relational features, and advanced conflict resolution remain unimplemented.
## Core Concept Alignment
### ✅ Well-Aligned Concepts
1. **Delta Structure**
- **Spec**: Deltas contain pointers with name/target/context fields
- **Implementation**: Correctly implements both V1 (array) and V2 (object) formats
- **Tests**: Basic format conversion tested, but validation gaps exist
2. **Lossless Views**
- **Spec**: Full inventory of all deltas composing an object
- **Implementation**: `LosslessViewDomain` correctly accumulates deltas by entity/property
- **Tests**: Good coverage of basic transformation, filtering by creator/host
3. **Lossy Views**
- **Spec**: Compression of lossless views using resolution strategies
- **Implementation**: Initializer/reducer/resolver pattern provides flexibility
- **Tests**: Domain-specific example (Role/Actor/Film) demonstrates concept
4. **Basic Conflict Resolution**
- **Spec**: Resolution strategies for collapsing delta sets
- **Implementation**: Last-Write-Wins resolver implemented
- **Tests**: Basic LWW tested, but limited to simple cases
### ⚠️ Partial Implementations
1. **Schemas**
- **Spec**: Templates for object compilation with property specification
- **Implementation**: `TypedCollection<T>` provides thin typing layer
- **Tests**: No schema validation or constraint testing
2. **Negation**
- **Spec**: Specific delta type with "negates" pointer
- **Implementation**: Not explicitly implemented
- **Tests**: No negation tests
3. **Transactions**
- **Spec**: Not explicitly mentioned but implied by delta grouping
- **Implementation**: Transaction structure exists in types
- **Tests**: Transaction filtering marked as TODO
### ❌ Missing Implementations
1. **Query System**
- **Spec**: JSON Logic expressions for filtering
- **Implementation**: Types exist but no implementation
- **Tests**: All query tests are skipped
2. **Relational Features**
- **Spec**: Schema-based relationships between objects
- **Implementation**: `collection-relational.ts` exists but minimal
- **Tests**: All relational tests are skipped
3. **Advanced Conflict Resolution**
- **Spec**: Multiple resolution strategies (min/max/average for numerics)
- **Implementation**: Only LWW implemented
- **Tests**: No tests for alternative strategies
4. **Nested Object Resolution**
- **Spec**: Schema-controlled depth limiting to prevent infinite recursion
- **Implementation**: Not implemented
- **Tests**: No tests for nested object handling
## Test Coverage Gaps
### Critical Missing Tests
1. **Delta Validation**
- No tests for invalid delta structures
- No tests for required field validation
- No tests for pointer consistency
2. **Schema Enforcement**
- No tests for schema validation during view generation
- No tests for property type enforcement
- No tests for nested schema application
3. **Concurrent Operations**
- No tests for concurrent delta creation
- No tests for timestamp-based ordering edge cases
- No tests for distributed conflict scenarios
4. **Network Resilience**
- Limited peer connection testing
- No tests for network partitions
- No tests for delta propagation failures
### Performance and Scale
1. **Large Dataset Handling**
- No tests for entities with thousands of deltas
- No tests for memory efficiency of views
- No tests for query performance on large collections
2. **View Materialization**
- No tests for incremental view updates
- No tests for view caching strategies
- No tests for partial view generation
## Recommendations
### High Priority
1. **Implement Query System**: The skipped query tests suggest this is a planned feature
2. **Add Schema Validation**: Essential for data integrity in distributed systems
3. **Expand Conflict Resolution**: Implement numeric aggregation strategies
4. **Test Edge Cases**: Add validation, error handling, and concurrent operation tests
### Medium Priority
1. **Implement Negation**: Core spec concept currently missing
2. **Add Nested Object Handling**: Prevent infinite recursion with schema depth limits
3. **Enhance Transaction Support**: Complete transaction-based filtering
4. **Improve Network Testing**: Add resilience and partition tolerance tests
### Low Priority
1. **Performance Benchmarks**: Add tests for scale and efficiency
2. **Advanced CRDT Features**: Implement vector clocks or hybrid logical clocks
3. **View Optimization**: Add incremental update mechanisms
## Conclusion
The rhizome-node implementation successfully captures the core concepts of the spec but requires significant work to achieve full compliance. The foundation is solid, with the delta/lossless/lossy pipeline working as designed. However, advanced features like queries, schemas, and sophisticated conflict resolution remain unimplemented. The test suite would benefit from expanded coverage of edge cases, validation, and distributed system scenarios.

18
spec.md Normal file
View File

@ -0,0 +1,18 @@
* A `delta` is an immutable atomic unit that relates one or more values in semantically meaningful ways as of some point in time. A delta can be thought of as both a `CRDT` and as a `hyper-edge` in the implicit hypergraph that makes up the rhizome. A delta contains one or more `pointers`.
* A `pointer` is composed of at least two and possibly three fields. A `delta` contains a set of `pointers`. The fields of a pointer are:
* `name` - identifies the meaning of the pointer from the perspective of the delta that contains it.
* `target` - identifies a `value` to associate with the `name`.
* `context` - optionally, when pointing at an `object`, the `context` identifies the field or property of that object with which this delta is associated.
* A `value` is one of two kinds of primitive that can be referred to by a `delta`:
* a `reference` is a UUID or other value understood to be pointing at either a `delta` or an `object`.
* a `primitive` is a literal string, number or boolean value whose meaning is not tied up in its being a reference to a larger whole.
* An `object` is a composite object whose entire existence is encoded as the set of deltas that reference it. An object is identified by a unique `reference`, and every delta that includes that `reference` is asserting a claim about some property of that object.
* A `negation` is a specific kind of delta that includes a pointer with the name `negates`, a `target` reference to another delta, and a `context` called `negated_by`.
* a `schema` represents a template by which an `object` can be compiled into a `lossless view`. A schema specifies which properties of that object are included, and it specifies schemas for the objects references by the deltas within those properties. A schema must terminate in primitive schemas to avoid an infinite regress.
* For instance, a `lossless view` "User" of a user may include references to friends. If those friends are in turn encoded as instances of the "User" schema then all of *their* friends would be fully encoded, etc.
* This could lead to circular references and arbitrarily deep nesting, which runs into the problem of "returning the entire graph". So our schema should specify, for instance, that the "friends" field apply the "Summary" schema to referenced users rather than the "User" schema, where the "Summary" schema simply resolves to username and photo.
* A `lossless view` is a representation of an `object` that includes a full inventory of all of the deltas that compose that object. So for instance, a lossless view of the object representing the user "Alice" might include `alice.name`, which contains an array of all deltas with a pointer whose `target` is the ID of Alice and whose context is `name`. Such deltas would likely include a second pointer with the name `name` and the target a primitive string "Alice", for instance.
* A `lossless view` may also include nested delta/object layering. Consider `alice.friends`, which would include all deltas asserting friendship between Alice and some other person. Each such delta would reference a different friend object. In a lossless view, these references would be expanded to contain lossless views of those friends. Schemas, as defined above, would be applied to constrain tree depth and avoid infinite regress.
* A `lossy view` is a compression of a `lossless view` that removes delta information and flattens the structure into a standard domain object, typically in JSON. So instead of `alice.name` resolving to a list of deltas that assert the object's name it might simply resolve to `"alice"`.
* Note that in a lossless view any property of an object necessarily resolves to a set of deltas, even if it's an empty set, because we cannot anticipate how many deltas exist that assert values on that context.
* In collapsing a lossless view into a lossy view we may specify `resolution strategies` on each field of the schema. A resolution strategy takes as input the set of all deltas targeting that context and returns as output the value in the lossy view. So if we have 15 deltas asserting the value for an object's name, our resolution strategy may simply say "return the target of the `name` pointer associated with the most recent delta", or it may say "return an array of names". If the value is numeric it may say "take the max" or "take the min" or "take the average".

154
todo.md Normal file
View File

@ -0,0 +1,154 @@
# TODO - Rhizome Node Spec Parity
This document tracks work needed to achieve full specification compliance, organized by priority and dependencies.
## Phase 1: Foundation (Prerequisites)
### 1.1 Delta Validation & Error Handling
- [ ] Implement delta structure validation
- [ ] Add tests for invalid delta formats
- [ ] Add tests for required fields (id, created, pointers)
- [ ] Implement proper error types for delta operations
- [ ] Add validation for pointer consistency
### 1.2 Complete Transaction Support
- [ ] Implement transaction-based filtering in lossless views
- [ ] Add transaction grouping in delta streams
- [ ] Test atomic transaction operations
- [ ] Add transaction rollback capabilities
### 1.3 Schema Foundation
- [ ] Design schema type definitions based on spec
- [ ] Implement basic schema validation
- [ ] Create schema registry/storage mechanism
- [ ] Add property type enforcement
- [ ] Test schema application to collections
## Phase 2: Core Features (Spec Compliance)
### 2.1 Negation Deltas
- [ ] Implement negation delta type with "negates" pointer
- [ ] Add "negated_by" context handling
- [ ] Update lossless view to handle negations
- [ ] Update lossy resolvers to respect negations
- [ ] Add comprehensive negation tests
### 2.2 Advanced Conflict Resolution
- [ ] Implement numeric aggregation resolvers (min/max/sum/average)
- [ ] Add timestamp-based ordering with tie-breaking
- [ ] Implement confidence level resolution
- [ ] Add custom resolver plugin system
- [ ] Test concurrent write scenarios
### 2.3 Nested Object Resolution
- [ ] Implement schema-controlled depth limiting
- [ ] Add circular reference detection
- [ ] Create "Summary" schema type for references
- [ ] Test deep nesting scenarios
- [ ] Add performance tests for large graphs
## Phase 3: Query System
### 3.1 Query Engine Foundation
- [ ] Implement JSON Logic parser
- [ ] Create query planner for lossless views
- [ ] Add query execution engine
- [ ] Implement query result caching
- [ ] Enable the skipped query tests
### 3.2 Query Optimizations
- [ ] Add index support for common queries
- [ ] Implement query cost estimation
- [ ] Add query result streaming
- [ ] Test query performance at scale
## Phase 4: Relational Features
### 4.1 Relational Schema Expression
- [ ] Design relational schema DSL
- [ ] Implement foreign key constraints
- [ ] Add relationship traversal in queries
- [ ] Implement join operations in lossy views
- [ ] Enable the skipped relational tests
### 4.2 Constraint Validation
- [ ] Add unique constraints
- [ ] Implement required field validation
- [ ] Add custom constraint functions
- [ ] Test constraint violations and error handling
## Phase 5: Advanced Features
### 5.1 View Optimizations
- [ ] Implement incremental view updates
- [ ] Add view materialization strategies
- [ ] Create view caching layer
- [ ] Add partial view generation
### 5.2 Network Resilience
- [ ] Add network partition handling
- [ ] Implement delta retry mechanisms
- [ ] Add peer health monitoring
- [ ] Test split-brain scenarios
### 5.3 Performance & Scale
- [ ] Add benchmarks for large datasets
- [ ] Implement delta pruning strategies
- [ ] Add memory-efficient view generation
- [ ] Create performance regression tests
## Phase 6: Developer Experience
### 6.1 Better TypeScript Support
- [ ] Improve TypedCollection type inference
- [ ] Add stricter schema typing
- [ ] Create type guards for delta operations
- [ ] Add better IDE autocomplete support
### 6.2 Debugging & Monitoring
- [ ] Add delta stream visualization
- [ ] Create conflict resolution debugger
- [ ] Add performance profiling hooks
- [ ] Implement comprehensive logging
### 6.3 Documentation
- [ ] Document schema definition format
- [ ] Create resolver implementation guide
- [ ] Add query language documentation
- [ ] Write migration guides from v1 to v2
## Testing Priorities
### High Priority (Block Progress)
1. Delta validation tests
2. Transaction support tests
3. Basic schema validation tests
4. Negation handling tests
### Medium Priority (Needed for Features)
1. Advanced resolver tests
2. Nested object tests
3. Query engine tests
4. Relational constraint tests
### Low Priority (Nice to Have)
1. Performance benchmarks
2. Network resilience tests
3. Large-scale integration tests
## Implementation Order
1. **Start with Phase 1** - These are foundational requirements
2. **Phase 2.1 (Negation)** - Core spec feature that affects all views
3. **Phase 2.2 (Resolvers)** - Needed for proper lossy views
4. **Phase 3 (Query)** - Unlocks powerful data access
5. **Phase 2.3 (Nesting)** - Depends on schemas and queries
6. **Phase 4 (Relational)** - Builds on query system
7. **Phase 5 & 6** - Optimization and polish
## Notes
- Each phase should include comprehensive tests before moving to the next
- Schema design in Phase 1.3 will impact many subsequent phases
- Query system (Phase 3) may reveal needs for index structures
- Consider creating integration tests that span multiple phases