Artirix — EMIS

EMERGING MARKETS INFORMATION SERVICE

Emerging Markets Information Service (EMIS) delivers deep, rich company and industry information, alongside the relevant proprietary and multi-source news, research and analytics that allow professionals to make profitable decisions faster.

This single resource of hard-to-get information covers more than 100 emerging markets, includes company profiles and financials from more than 1.3 million listed and private companies, offers single company and industry analysis, and delivers proprietary and multi-source news and research from over 9,000 publications, all delivered via an easy-to-use interface.

The challenge

The EMIS team came to us as they had a legacy commercial search technology installation that had a poor user experience, poor relevancy, and a high cost of ownership. They wanted to scale for more dynamic and personalised content, and needed a backend that would future proof their growth.

The result

Internet Securities Inc. (ISI, a subsidiary of Euromoney Institutional Investor) and Artirix are proud to announce the launch of a new platform for the Emerging Markets Information Service (EMIS –www.securities.com). The end result is a rich, highly accurate and personalised new application, helping EMIS subscribers more effectively discover important news and research.

The upgraded solution runs on Artirix’s scalable cloud platform. It provides dynamic search and personalised content pages per user, with greatly improved quality in all languages, including Chinese Mandarin. The backend covers over 1.2 terabytes of full text and structured metadata, and is future proofed for growth. In addition, Artirix’s service model has allowed ISI to reduce costs and operational risk.

“Our subscribers are very demanding, and today search is a core part of our premium service offering. Our goal was to replace our aging legacy search engine with a modern, feature-rich, SaaS solution running in the cloud. Since search presented large and complex architectural challenges, it made sense to work with Artirix. Their deep expertise configuring and tuning Elastic Search delivered what we needed. The new solution has been well received from our global subscribers, and has removed the operational burden associated with search.”

— Carl Blake, VP, Head of Technology @ EMIS.

The Solution

In summary the Artirix platform for this service supports 200 million financial news articles, which equate to a 1.2TB index size in the core search technology in our platform, Elasticsearch.

It is used for free text queries in one of 15 languages, combined with structured filters, and around 10 facets are used in the user interface.

Under the hood of our platform we combined several technologies to support this scale:

Artirix document processing – to analyse, augment and normalize in bound data
MongoDB – for storage alongside Elasticsearch
RabbitMQ – to manage the data flow into 2 separate clusters, and a custom river for Elasticsearch
Artirix cottontail – transmits index status updates from RabbitMQ to updates in MongoDB
Elasticsearch for the core search system with some customizations for analysis, and query extensions
Customer Query API which allows textual queries to be translated into elasticsearch DSL

BENCHMARKING & HOSTING

To ensure we delivered a service which met the volume, query loads expected we did extensive benchmarking on Amazon EC2 with various instance types against the number of shards per index to determine the best cost versus performance balance. In the end we settled for SSD boxes from Amazon.

APIS

The service is accessed via REST APIs – an index API for adding / deleting new documents, and a query API for searching.

ARCHITECTURE

Parts of the platform architecture we used were inspired by talks at an Elasticsearch London Meetup, and its best understood by following a single document through the system:

MongoDB is used as the canonical data store, and holds both the document data and the current status of each document. Documents added to the system are stored in a dual-region mongo replica set as soon as possible after they are received by the Artirix Index API. After this, they are pushed onto a RabbitMQ queue from where they will be received by the Custom RabbitMQ River running in Elasticsearch. This river indexes the documents, and posts a status message about the document back to RabbitMQ. Finally, the message is picked up by Artirix Cottontail which updates the status of the document in MongoDB.

We run two independent Elasticsearch clusters. This allows one to be rebuilt in the case of disaster while the other serves search requests.

ELASTICSEARCH

The documents contain data in 15 languages. These are handled by the built in language analyzers in Elasticsearch. In addition we use a custom analysis step to index each word in both its stemmed and unstemmed form. This allows for example to look for exact word matches when performing phrase searches, but to use stemming at other times.

We also built a custom plugin to allow the use of wildcards in span queries.

VOLUMES

Documents: 204m
MongoDB Index Size: 1.2TB
Elasticsearch Index Size: 1.2TB

EMIS

Year: 2013

Services

Software Development
Data + Content Search
Support + Maintenance

Components

READY TO GET STARTED?

Tell us about your project