Financial News Scraper Project

Daily Bulletin Scrapers

Web Scraping • PDF Processing • Selenium Automation • Python

Project Overview

A sophisticated Python-based scraping system that automatically downloads, parses, and processes daily financial bulletins from major Turkish brokerage firms. The system extracts company news items from PDF documents, performs intelligent ticker symbol matching against BIST-traded stocks, and aggregates all data into a structured CSV format for comprehensive financial analysis and market research.

Scraping Pipeline

01

Web Navigation

Selenium WebDriver
Automated browsing
Dynamic content handling

02

PDF Download

Daily bulletin retrieval
Direct downloads
Error handling

03

Content Extraction

PDF text parsing
pdfminer processing
Pattern recognition

04

Ticker Matching

BIST symbol validation
Fuzzy matching
Company identification

05

Data Aggregation

CSV consolidation
Timestamp tracking
Source attribution

Brokerage Firms Coverage

Oyak Yatırım

Direct

Direct PDF download and parsing with automated content extraction

Piramit Menkul

Selenium

Selenium-based navigation and extraction for dynamic content

Tacirler Yatırım

Selenium

Automated web browsing with Selenium for bulletin retrieval

Vakıf Yatırım

Selenium

Dynamic content handling with advanced Selenium techniques

Ziraat Yatırım

Direct

PDF processing with direct download methodology

Technical Challenges & Solutions

PDF Format Variations

pdfminer

Implemented robust parsing to handle different bulletin layouts and format inconsistencies across firms

Dynamic Web Content

Selenium

Used WebDriver to handle JavaScript-heavy brokerage websites with complex navigation

Ticker Symbol Accuracy

fuzzywuzzy

Deployed intelligent matching algorithms against BIST reference data for accurate identification

Error Resilience

Exception Handling

Built comprehensive handling for PDFSyntaxError and NoSuchElementException scenarios

Data Consistency

Standardization

Standardized date/time formatting and source attribution across all scraper modules

Website Changes

Modular Design

Modular architecture allows easy updates when brokerage sites modify their structure

CV GitHub LinkedIn Contact