What's under the hood of an Artirix classified service?
Clients have used our complete set of components to power their classified services and rely on us to continue to deliver through our managed service model. Let us take you behind the scenes of one of our classified services.
We take feed data or content
Our feed processing framework is designed to cope with hundreds of different structured data feed formats, in massive volumes. Feed processing through our feed management component is configured by plugins so we can quickly create new processors, combined with an online management tool. We have customers feeding 200 million documents to us, all handled in the cloud. Built in python and delivered to EC2, the framework can scale to very high volumes.
Or users CAN PUBLISH ITEMS DIRECTLY
Many of our customers had a mixture of feeds and listings that their advertisers wanted to create and manage through self-service. So we build a Self-service user interface which provided a mechanism to login and self-serve. This allows them to publish new listings, upgrade, checkout, view leads, view analytics on the performance of their adverts.
In near real-time ...
Our push API supports JSON to create, update and delete records, running in a synchronous or asynchronous mode, including callbacks. It handles prioritisation of content objects to jump the queue. Problem documents are retried and then fed to a dead letter queue.
We enrich and process
“Text processing” / “content processing” – a few different labels. At Artirix we have built a framework to mass process content objects, with the goal of enriching and normalizing, at scale and speed. The architecture is based on a distribution point with many pipeline nodes for processing. Content is routed through each process made up of steps:
- Entity extraction based on a dictionary, or pattern matching approaches (e.g. item features, proper names, email addresses)
- Address extraction and normalization – turn messy locations into structure, add geo codes for location based content.
- Deduplication based on content properties
- Media download and process – detect linked media objects and retrieve them to a common store, processed to common properties
Managing the data and advertisers
Alongside our feed and enrichment framework Artirix we've built a CRM and administration framework in order to associate the “items” supplied with by advertisers. This allows us to track usage data and leads, and report performance back to advertisers. It also allows us to support different business models, such as subscriptions, freemium, paid upgrades and so forth. Our components keep items in the index updated with their advertising state. There is also an advert serving API to allow ads to be positioned throughout a site with advert request codes.
start the search
So we have a stream of fed, normalized and enriched content. We then store this typically in MySQL or MongoDB, with an Artirix component then responsible for delivering a mirror to Elasticsearch. This core part of the architecture is built for scale, with example customers storing in excess of 200m records. We use Elasticsearch for other cool functions as well, such as analysing trends in the index, storing logs, and behavioural search events. This allows us to monitor the state of the platform from both a service and content analytics perspective (e.g. response times for search, zero results, popular queries, time series for content trends in the index).
Then analyse and experiment
Don't worry, it doesn't end there ...
we've got your back with our APIs
A wide variety of APIs are made available to deliver a complete experience:
- User API - for storage and profiling
- Analytics API - allowing events to be tracked and stored centrally
- Alerting API – allowing users to save items and save searches
- Leads API – to route leads from your service to advertisers
- Search API – the core API to build a rich visual experience for users to search the items. This is typically delivered with facetted search, geo search, map visualisation of the data.