Job Title: Founding Search Engineer
Location: Hybrid – Dallas Area, Texas (Flexible)
Salary: Competitive based on experience plus Equity (Early-Stage AI Startup)
About Us - Sail AI: Building Intelligent Discovery Platforms
Sail AI is an early-stage startup on a mission to revolutionize discovery
through AI-driven SaaS platforms for consumer and enterprise markets. Our
flagship product is an AI-powered lifestyle recommendation engine that
integrates robust web crawling, semantic search, advanced vector indexing, and
real-time personalization. We tackle complex search and recommendation
challenges using AI, large language models (LLMs), and knowledge-based
systems.
The Opportunity: Own Our Data and Search Foundation
We’re seeking a Founding Search Engineer to take ownership of our end-to-end
data infrastructure and search capabilities—from scratch. As an integral core
member of our team, you’ll be responsible for architecting advanced web
crawlers, developing sophisticated indexing strategies, and building knowledge
graph architectures that power our AI-driven recommendations. If you’re
looking for high-impact work, early-stage equity, and the chance to shape
product decisions, this role is for you.
What You'll Do:
* Web Crawling & Data Ingestion: Build from scratch distributed, resilient web crawlers to handle complex, dynamic websites. Implement anti-bot measures (IP rotation, CAPTCHA solving) to optimize crawler success rates. Develop data ingestion pipelines for continuous harvesting and normalization of web data.
* Advanced Search & Indexing Systems: Integrate and optimize Elasticsearch/OpenSearch with vector databases (e.g., Pinecone, Weaviate) to enable hybrid semantic + keyword search. Create cutting-edge indexing strategies, including vector-based embeddings and graph-based approaches. Collaborate with AI and Full-Stack engineers to fine-tune LLM-driven ranking algorithms and recommendation models.
* Knowledge Graph Architecture: Design and implement scalable knowledge graphs and vector indexing systems to unify structured/unstructured data. Establish data lineage and governance frameworks to ensure reproducibility and traceability. Orchestrate multi-step AI workflows using tools like LangChain, LangGraph, etc.
* Early-Stage Data Infrastructure: Architect the foundational data infrastructure in the cloud (AWS, Azure, or GCP) with a focus on scalability and cost-efficiency. Champion security and compliance for sensitive data through encryption, access controls, and adherence to regulations. Drive CI/CD best practices (GitHub Actions, Jenkins) and containerization (Docker, Kubernetes) for frictionless deployments.
* High-Impact Collaboration: Own critical product features end-to-end, shaping the intelligence that powers our recommendations. Work closely with product, AI, and Full-Stack engineers to align technical roadmaps with business goals. Contribute to agile sprints and strategic roadmap discussions, directly influencing company growth and trajectory.
Core Technical Competencies (REQUIRED):
* Crawl & Scrape Public, API, and Social Media Data . Develop robust scraping/crawling solutions (e.g., Scrapy, Selenium) with anti-bot tactics. . Automate data extraction from diverse sources—public websites, APIs, social feeds.
* Build Data Pipelines & Knowledge Base . Ingest, transform, and clean raw data (ETL/ELT workflows). . Store curated data in databases, knowledge graphs, or data lakes. . Maintain data lineage, versioning, and integrity for downstream usage.
* Create Vectors & Indexes . Generate embedding vectors (e.g., BERT/GPT) for semantic retrieval. . Construct indexes via Elasticsearch/OpenSearch or vector DBs (Milvus, Pinecone). . Integrate knowledge-graph or hybrid indexing strategies for advanced queries.
* Train & Fine-Tune Models (RAG, Custom Fine-Tuning) . Use the curated knowledge base for retrieval-augmented generation. . Fine-tune large language models with domain-specific data. . Continuously iterate for relevance, performance, and accuracy.
What You'll Bring:
* Education: Bachelor’s in Computer Science, Data Engineering, or related field (Master’s/PhD a plus).
* Experience: 5+ years in data engineering with a focus on search, web scraping, or knowledge systems.
* Technical Skills: . Proficiency in Python (Scrapy, FastAPI) or Java (Spring Boot). . Expertise in Elasticsearch/OpenSearch, vector databases, and anti-bot/distributed crawling techniques. . Skilled with CI/CD pipelines (GitHub Actions, Jenkins) and containerization (Docker, Kubernetes). . Soft Skills: Strong ownership mentality, problem-solving, and cross-team communication. Comfortable in a fast-paced, early-stage startup environment.
Bonus Points For:
* Familiarity with LLM orchestration (LangChain, LlamaIndex) and recommendation systems.
* Hands-on experience with knowledge graphs (Neo4j, Amazon Neptune) or graph analytics.
* Cloud certifications (GCP, AWS, Azure).
Why Join Sail AI?
* Founding Role & Equity: Own a meaningful stake in our high-growth AI startup.
* High-Impact Problems: Tackle cutting-edge challenges in web crawling, semantic search, and advanced knowledge representation.
* Innovative Technology: Pioneer the latest in LLMs, vector search, and knowledge graph engineering.
* Collaborative Culture: Join a passionate team that values curiosity, creativity, and results.
* Flexibility: Hybrid work setup with the freedom to shape your schedule.
Ready to Build the Future of Intelligent Discovery?
If you are a talented and driven Search Engineer looking for a high-impact
founding role in an early-stage AI startup, we encourage you to apply!
Apply Now on Wellfound!