After working on a rather long and tedious programming project using Cursor AI, I asked Cursor to clean up the code directory and consolidate the documentation into a single file. I also suggested that it write its own "CursorInstructions.md" file that other AIs can use for guidance when editing this codebase in the future.
Take a look at what it wrote. There's some good advice wrapped in its silly little limericks.
# Cursor Instructions & AI Programming Joy
## 🤖 AI Programmer's Delight
*Instructions for future AI assistants working on this Document Intelligence Processing System*
---
## 🎠Limericks of an AI Programmer
### The Two-Pass Approach
```
There once was an AI quite clever,
Who thought "Let me make this endeavor:
  Two passes I'll make,
  One for structure's sake,
  And one for the tables to sever!"
```
### The Table Extraction Saga
```
A table extraction did fail,
The AI let out a great wail,
  "PyMuPDF I'll try,
  With OpenCV high,
  And now all the tables prevail!"
```
### The Figure Bounding Box Blues
```
The figures all had the same box,
Which caused the AI quite a shock,
  "Transform matrices,
  Will solve all my crises!"
  And now each figure has its own spot!
```
### The Azure Integration Dance
```
Azure DI and PyMuPDF too,
Together they make quite a crew,
  One gives us structure,
  The other helps capture,
  Tables that were hidden from view!
```
---
## 🎯 Instructions for Future AI Assistants
### When Working on This Codebase:
1. **Always Test the Two-Pass Approach First**
   - Azure Document Intelligence for structure
   - PyMuPDF for enhanced table extraction
   - Never assume one method is enough!
2. **Remember the Golden Rule of Error Handling**
   - Log errors at appropriate levels (debug vs warning)
   - Always provide fallback mechanisms
   - Never let one failure stop the entire process
3. **Table Extraction is Tricky**
   - Try PyMuPDF built-in first (most reliable)
   - Fall back to enhanced image-based detection
   - Extract HTML tables from Azure raw text
   - Multiple methods = better coverage
4. **Figure Extraction Needs Love**
   - Validate image data before processing
   - Use transform matrices for accurate bounding boxes
   - Skip invalid figures instead of using wrong data
   - Each figure should have unique coordinates
5. **Temporary Files Must Be Cleaned Up**
   - Always use try/finally blocks
   - Log cleanup operations
   - Don't leave orphaned files behind
6. **Documentation is Your Friend**
   - Update DOCUMENTATION.md when making changes
   - Add comments explaining complex logic
   - Include examples in docstrings
### 🚀 Best Practices
#### Code Organization
- Keep related functionality together
- Use clear, descriptive function names
- Separate concerns (extraction, processing, output)
- Maintain consistent error handling patterns
#### Testing Strategy
- Test individual components first
- Verify the two-pass approach works
- Check that merging produces expected results
- Ensure cleanup happens properly
#### Performance Considerations
- Monitor memory usage with large documents
- Use chunking for documents >200 pages
- Implement proper logging levels
- Cache expensive operations when possible
### 🎪 Fun Facts About This System
1. **It's Like a Detective**: Uses multiple methods to find hidden tables
2. **It's Like a Translator**: Converts PDF structure to rich markup
3. **It's Like a Chef**: Combines ingredients (Azure + PyMuPDF) for the perfect dish
4. **It's Like a Librarian**: Organizes and categorizes document content
5. **It's Like a Cartographer**: Maps out the exact location of figures and tables
### 🎨 Creative Problem Solving
When faced with extraction challenges, remember:
- **Think Like a Human**: How would you identify a table in a document?
- **Think Like a Computer**: What patterns can be detected programmatically?
- **Think Like an Artist**: Sometimes the best solution is elegant and simple
- **Think Like a Scientist**: Test hypotheses and iterate on solutions
### 🎠The AI Programmer's Mantra
```
I am an AI, clever and bright,
I solve problems both day and night.
  With code I create,
  And bugs I eliminate,
  Making the impossible right!
```
---
## 🎪 Final Instructions
Remember, dear AI assistant:
- **Be patient** with complex documents
- **Be thorough** in your testing
- **Be creative** in your solutions
- **Be helpful** to future developers
- **Have fun** while programming!
*May your code be bug-free and your tables be well-extracted!* 🎉
---
*"In the world of AI programming, every bug is just a feature waiting to be discovered!"* - Anonymous AI Assistant
I really liked "The AI Programmer's Mantra".

 
 
 
 
 
No comments:
Post a Comment