What's under the hood of an Artirix classified service?

Clients have used our complete set of components to power their classified services and rely on us to continue to deliver through our managed service model. Let us take you behind the scenes of one of our classified services. 

Artirix components can be used to build  rich and dynamic web and mobile applications

Artirix components can be used to build rich and dynamic web and mobile applications

We take feed data or content

Our feed processing framework is designed to cope with hundreds of different structured data feed formats, in massive volumes. Feed processing through our feed management component is configured by plugins so we can quickly create new processors, combined with an online management tool. We have customers feeding 200 million documents to us, all handled in the cloud. Built in python and delivered to EC2, the framework can scale to very high volumes.


Many of our customers had a mixture of feeds and listings that their advertisers wanted to create and manage through self-service. So we build a Self-service user interface which provided a mechanism to login and self-serve. This allows them to publish new listings, upgrade, checkout, view leads, view analytics on the performance of their adverts.

In near real-time ... 

Our push API supports JSON to create, update and delete records, running in a synchronous or asynchronous mode, including callbacks. It handles prioritisation of content objects to jump the queue. Problem documents are retried and then fed to a dead letter queue.

We enrich and process

“Text processing” / “content processing” – a few different labels. At Artirix we have built a framework to mass process content objects, with the goal of enriching and normalizing, at scale and speed. The architecture is based on a distribution point with many pipeline nodes for processing. Content is routed through each process made up of steps:

  • Entity extraction based on a dictionary, or pattern matching approaches (e.g. item features, proper names, email addresses)
  • Address extraction and normalization – turn messy locations into structure, add geo codes for location based content.
  • Deduplication based on content properties
  • Media download and process – detect linked media objects and retrieve them to a common store, processed to common properties

Managing the data and advertisers

Alongside our feed and enrichment framework Artirix we've built a CRM and administration framework in order to associate the “items” supplied with by advertisers. This allows us to track usage data and leads, and report performance back to advertisers. It also allows us to support different business models, such as subscriptions, freemium, paid upgrades and so forth. Our components keep items in the index updated with their advertising state. There is also an advert serving API to allow ads to be positioned throughout a site with advert request codes.

start the search

So we have a stream of fed, normalized and enriched content. We then store this typically in MySQL or MongoDB, with an Artirix component then responsible for delivering a mirror to Elasticsearch. This core part of the architecture is built for scale, with example customers storing in excess of 200m records. We use Elasticsearch for other cool functions as well, such as analysing trends in the index, storing logs, and behavioural search events. This allows us to monitor the state of the platform from both a service and content analytics perspective (e.g. response times for search, zero results, popular queries, time series for content trends in the index).

Then analyse and experiment

Event tracking is a core component of any service. Key events are tracked back from queries, leads, and user interface interaction. For our customers we deliver visualisation tools to be able to gauge how successful the system sis performing both from a systems and customer engagement perspective. We typically use log cabin (an Artirix open source component), Kibana for visualisation in addition to our own dashboards, S3 for storage and our own events javascript for storing big data related to your platform. This help you make decisions and enhancements to drive your business further. We have worked with various companies to analyse data sets and drive value from your data.

We put together this article to discuss search analytics in more detail.

Don't worry, it doesn't end there ...

we've got your back with our APIs

A wide variety of APIs are made available to deliver a complete experience:

  • User API - for storage and profiling
  • Analytics API - allowing events to be tracked and stored centrally
  • Alerting API – allowing users to save items and save searches
  • Leads API – to route leads from your service to advertisers
  • Search API – the core API to build a rich visual experience for users to search the items. This is typically delivered with facetted search, geo search, map visualisation of the data.

Ready to get started?