Decades of business journalism, locked in PDFs. We built an AI pipeline that extracted, structured, and indexed 50,000+ articles - making them searchable and ready for the age of AI.
Original magazine page showing the complex layouts our AI pipeline processes
Magazine layouts break standard document processing. Multi-column text, images that split articles, content that spans pages - this required custom AI.
Magazine pages aren't documents. Articles wrap around images, jump across pages, and share space with ads. Standard OCR gives you garbled soup.
Most AI models struggle with Croatian. Diacritical marks get mangled, context is lost, and business terminology comes out wrong.
Four stages from raw PDF to structured, searchable content.
Automated processing of scanned magazine pages with quality normalization
Computer vision for structure detection, segmentation, and reading order
Text recognition with Croatian language processing and error correction
Converting to searchable, structured format with metadata enrichment
Custom algorithms for the specific challenges of magazine digitization.
Articles that continue on page 47 get automatically connected. Our algorithms track continuity markers and content flow across the entire magazine.
Pattern recognition extracts and normalizes author information from various byline formats and positions within magazine layouts.
Machine learning classifies articles by topic, industry, and content type - making the archive instantly navigable by subject.
Every extracted article runs through validation and quality control. Spell checking, format normalization, metadata enrichment, and deduplication happen automatically.
Every word, every article, instantly searchable. The archive becomes a queryable database, ready for AI applications.
From locked PDFs to living, searchable archive.
Decades of Croatian business journalism preserved in structured, future-proof format. Full-text search across the entire archive.
The structured archive now powers Pitaj Lider - the AI assistant. What was locked in PDFs is now training data for business intelligence.
Our document AI handles complex layouts, multiple languages, and archives of any size. Let's talk about your digitization project.