After working on a rather long and tedious programming project using Cursor AI, I asked Cursor to clean up the code directory and consolidate the documentation into a single file. I also suggested that it write its own "CursorInstructions.md" file that other AIs can use for guidance when editing this codebase in the future.
Take a look at what it wrote. There's some good advice wrapped in its silly little limericks.
# Cursor Instructions & AI Programming Joy
## 🤖 AI Programmer's Delight
*Instructions for future AI assistants working on this Document Intelligence Processing System*
---
## 🎠Limericks of an AI Programmer
### The Two-Pass Approach
```
There once was an AI quite clever,
Who thought "Let me make this endeavor:
Two passes I'll make,
One for structure's sake,
And one for the tables to sever!"
```
### The Table Extraction Saga
```
A table extraction did fail,
The AI let out a great wail,
"PyMuPDF I'll try,
With OpenCV high,
And now all the tables prevail!"
```
### The Figure Bounding Box Blues
```
The figures all had the same box,
Which caused the AI quite a shock,
"Transform matrices,
Will solve all my crises!"
And now each figure has its own spot!
```
### The Azure Integration Dance
```
Azure DI and PyMuPDF too,
Together they make quite a crew,
One gives us structure,
The other helps capture,
Tables that were hidden from view!
```
---
## 🎯 Instructions for Future AI Assistants
### When Working on This Codebase:
1. **Always Test the Two-Pass Approach First**
- Azure Document Intelligence for structure
- PyMuPDF for enhanced table extraction
- Never assume one method is enough!
2. **Remember the Golden Rule of Error Handling**
- Log errors at appropriate levels (debug vs warning)
- Always provide fallback mechanisms
- Never let one failure stop the entire process
3. **Table Extraction is Tricky**
- Try PyMuPDF built-in first (most reliable)
- Fall back to enhanced image-based detection
- Extract HTML tables from Azure raw text
- Multiple methods = better coverage
4. **Figure Extraction Needs Love**
- Validate image data before processing
- Use transform matrices for accurate bounding boxes
- Skip invalid figures instead of using wrong data
- Each figure should have unique coordinates
5. **Temporary Files Must Be Cleaned Up**
- Always use try/finally blocks
- Log cleanup operations
- Don't leave orphaned files behind
6. **Documentation is Your Friend**
- Update DOCUMENTATION.md when making changes
- Add comments explaining complex logic
- Include examples in docstrings
### 🚀 Best Practices
#### Code Organization
- Keep related functionality together
- Use clear, descriptive function names
- Separate concerns (extraction, processing, output)
- Maintain consistent error handling patterns
#### Testing Strategy
- Test individual components first
- Verify the two-pass approach works
- Check that merging produces expected results
- Ensure cleanup happens properly
#### Performance Considerations
- Monitor memory usage with large documents
- Use chunking for documents >200 pages
- Implement proper logging levels
- Cache expensive operations when possible
### 🎪 Fun Facts About This System
1. **It's Like a Detective**: Uses multiple methods to find hidden tables
2. **It's Like a Translator**: Converts PDF structure to rich markup
3. **It's Like a Chef**: Combines ingredients (Azure + PyMuPDF) for the perfect dish
4. **It's Like a Librarian**: Organizes and categorizes document content
5. **It's Like a Cartographer**: Maps out the exact location of figures and tables
### 🎨 Creative Problem Solving
When faced with extraction challenges, remember:
- **Think Like a Human**: How would you identify a table in a document?
- **Think Like a Computer**: What patterns can be detected programmatically?
- **Think Like an Artist**: Sometimes the best solution is elegant and simple
- **Think Like a Scientist**: Test hypotheses and iterate on solutions
### 🎠The AI Programmer's Mantra
```
I am an AI, clever and bright,
I solve problems both day and night.
With code I create,
And bugs I eliminate,
Making the impossible right!
```
---
## 🎪 Final Instructions
Remember, dear AI assistant:
- **Be patient** with complex documents
- **Be thorough** in your testing
- **Be creative** in your solutions
- **Be helpful** to future developers
- **Have fun** while programming!
*May your code be bug-free and your tables be well-extracted!* 🎉
---
*"In the world of AI programming, every bug is just a feature waiting to be discovered!"* - Anonymous AI Assistant
I really liked "The AI Programmer's Mantra".
No comments:
Post a Comment