Backend solution in Python to automate item searches across multiple information sources, including third-party portals via API, public sites via web scraping, OpenAI search services, and internal databases processed with ETL.
My role
As a backend developer, I was responsible for building integrations with external and internal sources, developing ETL processes and SQL queries to feed search indexes in Elasticsearch, and exposing the endpoints required for communication with the web frontend.
- Database queries: Developed SQL queries to extract relevant historical data from internal sources.
- Extract, transform, and load: Built ETL processes with Python and Pandas to transform and load structured data into Elasticsearch indexes.
- Backend development: Implemented API endpoints to connect the backend with the web frontend.
- Integration development: Integrated third-party portals via API and scraping to enrich the result set.
- Search algorithms: Automated update and refresh processes for data stored in PostgreSQL and Elasticsearch.
Key project features
- Federated, intelligent search across multiple sources from a single interface.
- Matching and relevance algorithms implemented with Elasticsearch.
- Automatic source refresh via scheduled ETL jobs.
- Integration with strategic partners and public sources through scraping and APIs.
- REST endpoints exposed for the web frontend.
Impact achieved
A robust solution was delivered for automated searches across multiple data sources, with high precision and speed thanks to Elasticsearch and relevance algorithms. This enabled users to get more complete, contextual results from a single interface, reducing manual search time and centralizing public, private, and partner data sources.
Technologies
| Technology | Use / Implementation |
|---|---|
| Python | Backend development, ETL, scraping, and integration logic. |
| SQL | Extraction of historical data to feed search processes. |
| FastAPI | Creation of REST APIs used to communicate with partner portals and the web application. |
| Pandas | Data transformation and cleaning before indexing. |
| PostgreSQL | Storage of transformed data managed by ETL processes. |
| Elasticsearch | Indexing and execution of intelligent search algorithms. |
Project flow

