Skip to main content
RichRelevance

Streaming Snapshot Service

Introduction

Creating product and place snapshots is the next step after creating the property definition collections. A snapshot is an instance of a catalog and is required before ingesting items (products, categories, regions).

A snapshot must be associated to the required property definition collections before items can be submitted through the streaming-ingest API.

The property definition collections must be "published" before creating a snapshot. 

There are four types of snapshots: Product, Place, referenceContent and Assortments. Place should always be created and activated. Product snapshot is for the product catalog. The referemceContent snapshot is for Find to index content. Assortments snapshot is for a future feature.

 

Streaming Catalog Main Flow with title.png


Snapshot Types

Product Snapshot

A product snapshot is required before catalog items can be added through the streaming-ingest service. The product snapshot is associated to both a "published" product and category property definition collection. 

Find requires that the search attributes have already been setup in the property definition product collection. Search Attributes are specific to FIND and are not required for Recommend. 

NOTE:

  • The streaming catalog does not use the Portal’s search attributes but there are plans to offer this functionality in the future. 
  • Attributes that are searchable are still required in the Portal's search attributes. 

Place Snapshot

The place snapshot is required to define regions. The place snapshot requires the "published" region property definition collection. Region over-rides are still added to the product (at ingest time) and the streaming engine will validate the regions by referencing the region definitions in the place snapshot. 

Note: The Place snapshot is required and must be "active" even if regions are not used.  

Snapshot States

A snapshot has five states: creating, complete, active, archived and deleted.

For each environment there is a limit on the number of snapshot in each state. In production, for each site, there can only be two “creating”, one “completed”, one “active”, three "archived" snapshots at any one time. There are unlimited "deleted" snapshots. Limits details can be found here.

When a snapshot is initially created and associated to the required property definition collections it will be in the "creating" state with an empty catalog. A "creating" snapshot is used for testing how items flows through the streaming catalog but it does not go to Find or the Legacy Catalog. It does go the the view store so transformations can be viewed.

When ingesting data, a snapshot is referenced and the data flows to “engine.out”. How data flows from "engine.out" to Find and the Catalog Database (Recommend & DIscover) depends on the snapshot state and is discussed further below:

 

Find and Snapshot States

Items that are in the engine.out topic are replicated to the Front End data centers.

  • Items in a "creating" snapshot only flow to the streaming-view store and not to Find.  
  • The "complete" states triggers the creation of a Find index which is called the "Cutover Snapshot Collection". It is not accessed by the Find API but will index all products ingested into a "complete" snapshot. This provides an opportunity to add catalog items but not have them live (in Find) until the snapshot becomes “active”.  
  •  When a "complete" snapshot is “activated”, then that snapshot's index will become the production "Current Snapshot Collection". The previous "active" snapshot is archived.

Recommend & Discover (postgres database):

For Recommend & Discover, the product and category items in snapshot that is either "complete" or "active" state, will flow to the Recommend & Discover database (postgres) via the Legacy Catalog Adaptor. When a snapshot is in the "creating" state, the Legacy Catalog Adaptor will not pick up the items. "Creating" is just for testing items flowing to the view store. 

When a "complete" snapshot is “activated” it doesn't have any specific impact on the Recommend DB. The Legacy Catalog Adaptor is not snapshot aware. This means that items from any "complete" or "active" snapshot maybe in the Legacy Catalog. The Legacy Catalog can be cleaned up by deleting items from another snapshot or a scoped action to sync the legacy catalog to the "active" snapshot can be done. The scoped action will ensure that all items in the Legacy Catalog are the same as in the "active" snapshot. It will delete any items in the Legacy Catalog that are not in the current "active" snapshot. More information on the scoped action to sync the Legacy Catalog can be found in the Scoped Action chapter.

NOTE: For both Find and Recommend customers,

Once satisfied the “creating” snapshot  is working well, a new snapshot is created, with the same “published”  property definition collections (as the "creating" test snapshot). This new “creating” snapshot needs to be set immediately to “complete” before ingesting items. Once set to "complete" ingested items will flow to Find and the Catalog Database (Recommend and Discover) 

This new empty snapshot will be used to ingest items which will flow to Find’s “cutover snapshot collection” as well as to the Legacy Catalog Adaptor (to support Recommend and Discover).

Once the snapshot is “activated”, then the snapshot items flow to the current production FIND Solr collection. When a snapshot is activated, the previously active snapshot is archived.

 

Summary of Snapshot States and Impact on Find, Catalog Database, and View Store

The table below is a summary of actions for each snapshot state on Find, Recommend and the View Store. These actions apply to the product snapshot.

Snapshot State Find Index  (Cutover) Find Index (Production) Catalog Database (Recommend & Discover) View Store Notes
Creating No items No items No items Accept Items "Creating" is used for validation and testing
Complete Accept Items No items Accept Items Accept Items For Recommend or Find: A new snapshot is created and immediately set to "completed" before ingesting data.
Active No items Accept Items Accept Items Accept Items If there was another active snapshot it would be archived. For Find the cutover index becomes the production index. Active snapshots cannot be deleted. 
Archived No items No items No items No items Archived snapshots are read-only and may be deleted. Ingested items will fail if directed to an archived snapshot and be directed to the invalid item store. 
Deleted No items No items No items No items No access to the deleted snapshot. 

 

Snapshot Limits

It is possible to have multiple snapshots, but in both production and QA there are limits. In production there can only be: two "creating, one "complete" one "active", three "archived" and unlimited "deleted" snapshots. Only a "creating", "complete" and "archived" snapshots can be deleted.  

There is a Snapshot Service API call to determine the limits in a specific environment. Also there is a call to determine the number of snapshots for a specific customer site.   

Below are the snapshot limits in production and QA as well as which action can be applied to a specific snapshot state.

Action Supported Actions for Snapshot States Snapshot Limit in Production & Staging Snapshot Limit in QA Notes
Create New snapshot --> creating 2 5 Response includes snapshotId
Complete creating --> completed 1 3  
Activate completed ---> active 1 1 If an "active" snapshot already exists, it will be archived. An active snapshot cannot be deleted.
Archive

creating --> archived

completed --> archived

Note: cannot archive an "active" snapshot. It will automatically be archived when a new snapshot is activated. 

3 10 Can only view information. Any information sent to an archived snapshot will be sent to the invalid item store. 
Delete

creating --> deleted

completed --> deleted

archived --> deleted

Note: Deleting a "creating" and "complete" snapshot will be supported soon. 

Unlimited Unlimited

The Find Solr collection (index) will also be deleted. 

Deleted snapshots will be purged from the system on a regular basis. Items will be removed from the view store. 

Cancel

creating --> deleted

completed --> deleted

Unlimited Unlimited Cancel stops all processing of items and the state will be "deleted". Cannot cancel an active or archived snapshot. 

 

A new snapshot is required if there is a breaking change to a property definition. This includes any change to an existing property definition. Adding a new property to a published property collection is not a breaking change and is supported. 

If a change is made to an existing property definition then:

  • Clone or create a new property definition collection with the changes
  • Publish the property definition
  • Create a new snapshot with the published property definitions
  • Complete the snapshot and re-ingest the items.
  • Activate the snapshot. 

Once a snapshot is archived, then any further updates to the archived snapshot are sent to the invalid items store.   

Note: Adding a property to a "published" property definition is supported for both Find and Recommend (Legacy Catalog).  

Streaming Snapshot Service Requests

Base URL: https://<host>/streaming-snapshot/v1/

'Authorization:Bearer <tokenValue>’

 

Full URL: https://<host>/streaming-snapshot/v1/<apiKey>/<action>/<snapshotId>/<snapshotType>

'Authorization:Bearer <tokenValue>’

 

Actions API Syntax

Create a new snapshot.  

For a product snapshot - the "published" product and category property definition collections are required.

for a Place snapshot - the "published" region property definition collection is required.

 

POST {baseURL} /<apiKey>/create/<snapshotType>

Body for Product snapshot:

{"name": "<snapshotName>",

"propertyDefinitionCollectionIds": {

"category":<categoryPropertyCollectionID>, "product":"<productPropertyCollectionID>}}

List all creating and active snapshots. Archive is not included GET {baseURL} /<apiKey>
List all snapshots including archived snapshots GET {baseURL} /<apiKey>?include=archived
List all snapshots (creating, complete, active) plus deleted snapshots. (Not yet available) Coming soon. 
List all creating, complete and active snapshots for a specific snapshot type. SnapshotType can be: product, place or referenceContent. Archived and deleted snapshots will not display.  GET {baseURL} /<apiKey>?snapshotType=<snapshotType>
List all snapshots for a specific snapshot type. Include archived snapshots for a specific site (apiKey) GET {baseURL} /<apiKey>?snapshotType=<snapshotType>&include=archived
List all active snapshots for this site (apiKey). GET {baseURL} /<apiKey>?state=active
Get a specific snapshot by id GET {baseURL} /<apiKey>/<snapshotId>
Get a specific snapshot by snapshot id and snapshot type GET {baseURL} /<apiKey>/<snapshotId>?snapshotType=<snapshotType>

Change a snapshot’s state from “creating” to “complete”. A "complete" snapshot's items will be ingested by the Legacy Catalog Adaptor and Find. 

Note: There is a limit of 1 complete snapshot in production and staging. 

POST {baseURL} /<apiKey>/complete/<snapshotId>/<snaptshotType>

Example for snapshotID: 9088

POST https://<host>/streaming-snapshot/v1/<apiKey>/complete/9088/product

Update a snapshot’s state from “complete” to “active”.  "Active" indicates to FIND that the catalog items managed by the snapshot are ready to be cutover to the production Find index. Until the snapshot is “active”, all items are indexed in the “Cutover Find Index”.

The Legacy Catalog does not differentiate between "complete" or "active" snapshot. However, it is recommended to "activate" even for Recommend only customers. 

 

POST {baseURL} /<apiKey>/activate/<snapshotId>/<snapshotType>

 

Example for snapshotID: 9088. 

POST https://<host>/streaming-snapshot/v1/<apiKey>/activate/9088/product

      

Archive a snapshot. Can only archive a “complete” or “creating” snapshot. 

There is a limit of 3 archived snapshots (production &staging)

 An "active" snapshot is automatically archived when another snapshot is activated. 

POST {baseURL} /<apiKey>/archive/<snapshotId>/<snapshotType>

 

Example for snapshotID: 9088

POST https://<host>/streaming-snapshot/v1/<apiKey>/archive/9088/product

 

Response:

{

   "statusTracker": {

       "trackingId": "f517be38-2aab-11eb-b2fb-57183eef702a",

       "trackingInstant": "2020-11-19T21:12:40.468024800Z"

   },

   "snapshot": {

       "id": 9088,

       "siteId": 608,

       "state": "archived",

       "propertyDefinitionCollectionIds": {

           "product": 22506,

           "category": 22505

       },

       "lastModified": "2020-11-19T21:12:40.469052Z",

       "name": "snapshotName",

       "type": "product"

   }

}

  

Delete a "creating", "complete" or "archived" snapshot.   POST {baseURL}/<apiKey/delete/<snapshotId>/<snapshotType>
Cancel a "creating" or 'complete" snapshot. Items will not be processed. It is meant there was a mistake and want to stop the ingestion of items. The state of the snapshot will be "deleted".   POST {baseURL}/<apiKey/cancel/<snapshotId>/<snapshotType>
Limits: See the snapshot limits for the QA, Staging or Production environment.  GET  {baseURL}/<apiKey/count/snapshot/limits
Count snapshots: View a count of each type of snapshot for one site. This does not include the deleted snapshots. GET  {baseURL}/<apiKey/count/snapshot
Count Snapshots: View a count of each type of snapshot for one site including all deleted snapshots. 

GET {baseURL}/<apiKey>/count/snapshot?includeDeleted=true

Example
 

GET https://<host>/streaming-snapshot/v1/123/count/snapshot?includeDeleted=true

 

Response:

{

    "counts": {

        "123": {

            "product": {

                "archived": 2,

                "deleted": 1,

                "creating": 2,

                "active": 1,

                "complete": 1

            }

        }

    }

}

Count all items in the view store and the legacy catalog. A trackingId will be returned. The streaming status call with the trackingId will provide the counts. See example below POST https://<host>/streaming-snapshot/v1/<apiKey>/count/<snapshotId>/product

Parameters

Name Description Details

snapshotName

Snapshot name - string (optional). If not provided the name will be the same as the snapshot ID.  
apiKey Unique identifier for a customer’s environment. For example, if a customer has multiple environments in production or staging, each would have a unique apiKey.  There can be many apiKeys associated to a client_ID Provided by RR to customer
snapshotId Unique identifier for the snapshot. Snapshot id is provided in the response to a POST when creating the snapshot
state There are four states Creating, Complete, Active and Archived
action

There are five actions. See Summary of Actions for Snapshot States for more information.

"create" - is to create a snapshot. State will be "creating" and ingested items will only go to the streaming-view store. Items will NOT go to the Recommend Database or Find index. 

"complete" - For Recommend customers, need to have the product snapshot in either a"complete" or "active" snapshot. Find will index in the "cutover Find index"

"activate" - For Find customers, an active snapshot will be indexed in the production Find index. If there was a current "active" snapshot, it will be archived.  Place snapshot must be "active".

"archive" - Items cannot be added to an archived snapshot.

"count" -  The "count" action is used for two purposes:

The first is to count the number of snapshots.

The second is to count all products, categories, regions, referenceContent, and assortments in both the legacy catalog and view store. For product, a count of recommendable and non-recommendable products, and categories is provided for the specified snapshot. The product snapshot needs to be referenced in the call.  The counts will be in the status message associated to the trackingId provided in the response. 

This call is restricted to once every 30 minutes. If a second call is made within the 30 minutes, the same results from the first call will be returned. 

This count method is recommended for customers who have millions of products.  An example is provided.

create, complete, activate, archive, count
snapshotType

product - requires product and category property definitions. Product items can have region and language over-rides.  

place -  is required for regions. All updates to regions are through the place snapshot type.  The place snapshot must be "active" and requires the region property definition.

referenceContent - specific to content search. 

 

product, place, referenceContent
propertyDefinitionCollectionIds

The ids for the required published property definition collections.

Example for a product snapshot.

{"propertyDefinitionCollectionIds": {
"category": 578,
"product": 579
} }

Example of a place snapshot

{"propertyDefinitionCollectionIds": {
"region": 580
} }

 

product snapshot

{"propertyDefinitionCollectionIds": {
"category": <categoryPropertyCollectionId>,
"product": <productPropertyCollectionId>} }

place snapshot
{"propertyDefinitionCollectionIds": {
"region": <regionPropertyCollectionId>} }

Snapshot Archetype

Required JSON body for creating a snapshot. 

Note: "name is optional"

Product snapshot body

{

"name":"<snapshotName>",

"propertyDefinitionCollectionIds": {
"category": <categoryPropertyCollectionId>,
"product": <productPropertyCollectionId>

}

}

OR

{"propertyDefinitionCollectionIds": {
"category": <categoryPropertyCollectionId>,
"product": <productPropertyCollectionId>

}

}

lastModified Timestamp for the snapshot creation or last change. Example: "2019-06-10T18:24:28.974652Z"

 

 

Examples

Create a new product snapshot

The property definitions for category and product have already been created and published. 

In this example the apiKey is 123. Note that "name" is optional and can be omitted. Below are examples with and without the "name" parameter in the body. 

POST https://gateway.richrelevance.com/streaming-snapshot/v1/123/create/product
'Authorization:Bearer <tokenValue>’

Body:
{“name”:”test-snapshot”,
"propertyDefinitionCollectionIds": {"category":578, "product":579}}

OR body without "name"
{"propertyDefinitionCollectionIds": {"category":578, "product":579}}

Response:

{
    "statusTracker": {
        "trackingId": "d6b10e13-cacf-11ea-bf3e-cf3f136f672e",
        "trackingInstant": "2020-07-20T21:27:39.719016300Z"
    },
    "snapshot": {
        "id": 400,
        "siteId": 123,
        "state": "creating",
        "propertyDefinitionCollectionIds": {
            "product": 579,
            "category": 578
        },
        "lastModified": "2020-07-20T21:27:39.718651Z",
        "name": "test-snapshot",
        "type": "product"
    }
}

 

The following is one method to count all products in a specific snapshot. The streaming-view has a responseStyle=count which counts but it is limited to 50,000 products at a time. 

The "count" action

The "count" action will count the number of ingested products, categories, regions, referenceContent and assortments for both the legacy catalog and the view store. The count is provided in a status message associated to the trackingId in the response. This method is recommended for customers who have millions of products.  

Note: a product snapshot needs to be provided as count for the view store is only for that specific product snapshot.   

Syntax is as follows:

POST https://<host>/streaming-snapshot/v1/<apiKey>/count/<snapshotId>/product
'Authorization:Bearer <tokenValue>’

Example where:  apiKey=123, snapshotId=7932

POST https://gateway.richrelevance.com/streaming-snapshot/v1/123/count/7932/product
'Authorization:Bearer <tokenValue>’

Response:

{
    "message": "Kicking off counting for snapshot 7932",
    "trackingId": "71248c63-e1a7-11ea-8565-297c33d675f6",
    "trackingInstant": "2020-08-18T23:06:26.223011500Z"
}

Get the counts from the status service using the trackingId in the response


GET https://gateway.richrelevance.com/streaming-status/v1/123/trackingId/71248c63-e1a7-11ea-8565-297c33d675f6

Response:

[
    {
        "siteId": 123,
        "snapshotId": 7932,
        "statusId": "e6c47203-ee36-11ea-b4e1-49d1afc58bb7",
        "trackingId": "71248c63-e1a7-11ea-8565-297c33d675f6",
        "source": "view-store",
        "statusType": "CountEvent",
        "message": "Snapshot Count",
        "kafkaSource": {
            "topic": "streaming.engine.out",
            "partition": 7,
            "offset": 115560
        },
        "level": "SUMMARY",
        "statusData": {
            "product": {
                "total": 30,
                "recommendable": 10,
                "nonRecommendable": 20
            },
            "category": {
                "total": 0
            },
            "region": {
                "total": 0
            },
            "referenceContent": {
                "total": 0
            },
            "assortment": {
                "total": 0
            }
        },
        "datacenter": "qa",
        "statusInstant": "2020-09-03T22:43:35.550003500Z",
        "trackingInstant": "2020-09-03T22:43:35.218013100Z",
        "msSinceRequest": 332
    },
    {
        "siteId": 123,
        "snapshotId": 7932,
        "statusId": "e6addcc2-ee36-11ea-b4e1-c390e055419c",
        "trackingId": "71248c63-e1a7-11ea-8565-297c33d675f6",
        "source": "legacy-catalog",
        "statusType": "CountEvent",
        "message": "Legacy Catalog Count",
        "kafkaSource": {
            "topic": "streaming.engine.out",
            "partition": 7,
            "offset": 115560
        },
        "level": "SUMMARY",
        "statusData": {
            "product": {
                "total": 6906,
                "recommendable": 10,
                "nonRecommendable": 6896
            },
            "category": {
                "total": 0
            },
            "region": {
                "total": 5
            },
            "referenceContent": {
                "total": 0
            },
            "assortment": {
                "total": 0
            }
        },
        "datacenter": "qa",
        "statusInstant": "2020-09-03T22:43:35.402003400Z",
        "trackingInstant": "2020-09-03T22:43:35.218013100Z",
        "msSinceRequest": 184
    },
    {
        "siteId": 123,
        "snapshotId": 7932,
        "statusId": "e6943a80-ee36-11ea-8dca-01010e2a5661",
        "trackingId": "71248c63-e1a7-11ea-8565-297c33d675f6",
        "source": "streaming-engine",
        "statusType": "CountEvent",
        "message": "CountEvent",
        "kafkaSource": {
            "topic": "streaming.engine.in",
            "partition": 7,
            "offset": 329001
        },
        "level": "SUMMARY",
        "statusData": {},
        "datacenter": "qa",
        "statusInstant": "2020-09-03T22:43:35.234009600Z",
        "trackingInstant": "2020-09-03T22:43:35.218013100Z",
        "msSinceRequest": 16
    },
    {
        "siteId": 123,
        "snapshotId": 7932,
        "statusId": "e6943a41-ee36-11ea-b4e1-41cbe01e3dca",
        "trackingId": "71248c63-e1a7-11ea-8565-297c33d675f6",
        "source": "streaming-engine-sidekick",
        "statusType": "CountEvent",
        "message": "Received CountEvent for site 123 snapshot 7932",
        "kafkaSource": {
            "topic": "streaming.engine.out",
            "partition": 7,
            "offset": 115560
        },
        "level": "SUMMARY",
        "statusData": {},
        "datacenter": "qa",
        "statusInstant": "2020-09-03T22:43:35.234003300Z",
        "trackingInstant": "2020-09-03T22:43:35.218013100Z",
        "msSinceRequest": 16
    },
    {
        "siteId": 123,
        "snapshotId": 7932,
        "statusId": "e69265e4-ee36-11ea-8425-19e8e00e8fd3",
        "trackingId": "71248c63-e1a7-11ea-8565-297c33d675f6",
        "source": "streaming-snapshot",
        "message": "Counting snapshot 9113",
        "level": "SUMMARY",
        "statusData": {},
        "datacenter": "qa",
        "statusInstant": "2020-09-03T22:43:35.222013200Z",
        "trackingInstant": "2020-09-03T22:43:35.218013100Z",
        "msSinceRequest": 4
    }
]

 

Note that the counts for the place and referenceContent will count all snapshots for those types. There should only be one "active" place snapshot. 

  • Was this article helpful?