Multi-source search automation

Multi-source search automation

  • 2025-5-30
  • Backend
  • PostgreSQL,Python,Pandas,Elasticsearch,

Backend solution in Python to automate item searches across multiple information sources, including third-party portals via API, public sites via web scraping, OpenAI search services, and internal databases processed with ETL.

My role

As a backend developer, I was responsible for building integrations with external and internal sources, developing ETL processes and SQL queries to feed search indexes in Elasticsearch, and exposing the endpoints required for communication with the web frontend.

  • Database queries: Developed SQL queries to extract relevant historical data from internal sources.
  • Extract, transform, and load: Built ETL processes with Python and Pandas to transform and load structured data into Elasticsearch indexes.
  • Backend development: Implemented API endpoints to connect the backend with the web frontend.
  • Integration development: Integrated third-party portals via API and scraping to enrich the result set.
  • Search algorithms: Automated update and refresh processes for data stored in PostgreSQL and Elasticsearch.

Key project features

  • Federated, intelligent search across multiple sources from a single interface.
  • Matching and relevance algorithms implemented with Elasticsearch.
  • Automatic source refresh via scheduled ETL jobs.
  • Integration with strategic partners and public sources through scraping and APIs.
  • REST endpoints exposed for the web frontend.

Impact achieved

A robust solution was delivered for automated searches across multiple data sources, with high precision and speed thanks to Elasticsearch and relevance algorithms. This enabled users to get more complete, contextual results from a single interface, reducing manual search time and centralizing public, private, and partner data sources.

Technologies

TechnologyUse / Implementation
PythonBackend development, ETL, scraping, and integration logic.
SQLExtraction of historical data to feed search processes.
FastAPICreation of REST APIs used to communicate with partner portals and the web application.
PandasData transformation and cleaning before indexing.
PostgreSQLStorage of transformed data managed by ETL processes.
ElasticsearchIndexing and execution of intelligent search algorithms.

Project flow

Multi-source search architecture flow