Software, tools, and data resources I use for search and development.

I get asked a lot about the software and tools I use for search engine architecture, data processing, and development work. Here are my recommendations organized by category.

Hardware

  • 16 inch MacBook Pro, M4, 48GB RAM (2024)

    Essential for running multiple search engine instances, processing large datasets, and handling complex indexing operations. The M4 handles heavy computational loads for search algorithm development without breaking a sweat.

  • Apple Magic Trackpad

    The multi-touch gestures are invaluable for navigating between search interfaces, monitoring dashboards, and managing multiple development environments simultaneously.

  • Herman Miller Aeron Chair

    When you are spending hours optimizing search algorithms and analyzing query performance, proper ergonomics makes a significant difference in productivity.

Development Tools

  • Visual Studio Code

    Perfect for editing SOLR configurations, OpenSearch mappings, and search query JSON. The syntax highlighting and multi-cursor editing capabilities make managing complex search schemas much more efficient.

  • iTerm2

    Essential for managing multiple search engine instances, running indexing jobs, and monitoring cluster health across different environments. The split panes are invaluable for watching logs while executing commands.

  • TablePlus

    Crucial for examining data before indexing, debugging search results, and analyzing query performance metrics. Makes it easy to understand data relationships when designing search architectures.

Search & Data Platforms

  • Apache SOLR

    My go-to search platform for enterprise implementations. Excellent for complex faceted search, geographic queries, and handling large-scale indexing operations with fine-grained control over relevance scoring.

  • OpenSearch

    Powerful for real-time analytics and full-text search. Great for log analysis, monitoring dashboards, and building search applications that need both search and analytical capabilities.

  • Apache Kafka

    Essential for building real-time data pipelines that feed search engines. Perfect for streaming data updates to maintain fresh search indexes without full rebuilds.

  • Python & Pandas

    Core tools for data preprocessing, cleaning, and transformation before indexing. Pandas makes it easy to analyze query logs and search performance metrics.

Data Sources

  • Common Crawl

    Massive repository of web crawl data, perfect for building large-scale search indexes and analyzing web content patterns. Essential for research into web-scale search algorithms and content analysis.

  • GeoNames

    Comprehensive geographical database with over 12 million place names. Invaluable for implementing location-based search features and geographic query expansion in search applications.

  • Nominatim

    OpenStreetMap-based geocoding service that provides excellent address and location search capabilities. Perfect for integrating geographic search functionality into applications.

  • GDELT Project

    Real-time global event database that monitors news and information sources worldwide. Excellent for building search applications focused on current events and trend analysis.