Software, tools, and data resources I use for search and development.
I get asked a lot about the software and tools I use for search engine architecture, data processing, and development work. Here are my recommendations organized by category.
Hardware
16 inch MacBook Pro, M4, 48GB RAM (2024)
Essential for running multiple search engine instances, processing large datasets, and handling complex indexing operations. The M4 handles heavy computational loads for search algorithm development without breaking a sweat.
Apple Magic Trackpad
The multi-touch gestures are invaluable for navigating between search interfaces, monitoring dashboards, and managing multiple development environments simultaneously.
Herman Miller Aeron Chair
When you are spending hours optimizing search algorithms and analyzing query performance, proper ergonomics makes a significant difference in productivity.
Development Tools
Visual Studio Code
Perfect for editing SOLR configurations, OpenSearch mappings, and search query JSON. The syntax highlighting and multi-cursor editing capabilities make managing complex search schemas much more efficient.
iTerm2
Essential for managing multiple search engine instances, running indexing jobs, and monitoring cluster health across different environments. The split panes are invaluable for watching logs while executing commands.
TablePlus
Crucial for examining data before indexing, debugging search results, and analyzing query performance metrics. Makes it easy to understand data relationships when designing search architectures.
Search & Data Platforms
Apache SOLR
My go-to search platform for enterprise implementations. Excellent for complex faceted search, geographic queries, and handling large-scale indexing operations with fine-grained control over relevance scoring.
OpenSearch
Powerful for real-time analytics and full-text search. Great for log analysis, monitoring dashboards, and building search applications that need both search and analytical capabilities.
Apache Kafka
Essential for building real-time data pipelines that feed search engines. Perfect for streaming data updates to maintain fresh search indexes without full rebuilds.
Python & Pandas
Core tools for data preprocessing, cleaning, and transformation before indexing. Pandas makes it easy to analyze query logs and search performance metrics.
Data Sources
Common Crawl
Massive repository of web crawl data, perfect for building large-scale search indexes and analyzing web content patterns. Essential for research into web-scale search algorithms and content analysis.
GeoNames
Comprehensive geographical database with over 12 million place names. Invaluable for implementing location-based search features and geographic query expansion in search applications.
Nominatim
OpenStreetMap-based geocoding service that provides excellent address and location search capabilities. Perfect for integrating geographic search functionality into applications.
GDELT Project
Real-time global event database that monitors news and information sources worldwide. Excellent for building search applications focused on current events and trend analysis.