Using List Columns When Requesting Batch Recommendations

Adding to a CSV-file for requesting batch recommendations a column that contains a list of strings can be error prone, because the list contains elements with commas and double quotes, which play a crucial role in the CSV format itself.

Consider Alternatives

Therefore, the available alternatives need to be considered, and if they are not withheld, special attention is required when formatting the list field.

Example

In the context of Froomle’s batch recommendations, the "categories" field, for applying a filter based on those categories, is a prime example.

"environment","page_type","list_name","list_size","categories","user_id"
"nyt","emailadhoc","emailadhoc_tech_3AjkJsNXcYaWZLPSjRXtGj",10,"[""cat1/node1/subnode1"",""cat2/node2/subnode2""]","a1UnIqUe2UsEr3Id"

List in JSON format

The "categories" field represents a list of strings in JSON format. This means:

  • the list is enclosed in square brackets,

  • list items are separated by commas, and

  • strings are between double quotes.

Embedding a JSON lists of strings in CSV

Because both the CSV format and the JSON list of strings use double quotes and commas, special measures are required. To correctly format a list like "categories" in a single CSV cell, we must

  • encapsulate the entire list in double quotes, to escape the commas, and

  • escape the double quotes around the string elements in the JSON list by doubling them.

How To Format A JSON List int the CSV

Putting all the above together, here’s how to properly format a JSON list of strings in a single CSV cell:

  1. Start with an opening double quote ", indicating the start of the field.

  2. Insert an opening square bracket [ for the list.

  3. For each element in the list, enclose it in double quotes. Since double quotes are used, they need to be escaped by doubling them. For instance, "cat1/node1/subnode1" becomes ""cat1/node1/subnode1"".

  4. Separate these elements with commas.

  5. After listing all elements, close the list with a square bracket ].

  6. End with a closing double quote " to signify the end of the field.

So, in the CSV file, the formatted "categories" field would appear as:

"[""cat1/node1/subnode1"",""cat2/node2/subnode2""]"

This format ensures the CSV parser correctly identifies the entire list as a single field and accurately interprets each category within the list, despite the presence of commas and quotes which are also integral to CSV formatting.

Available Alternatives

  1. When filter values don’t change (a lot), use static values.

  2. When you want the flexibility to change your filter values over time, use a dedicated label:

    • Create a dedicated label like include_in_froomle_module_x or exclude_from_froomle_module_y (this label is typically a category or a tag),

    • Attach the label to all relevant articles. You can do this

      • manually at article creation time, or

      • automatically, based on some basic tagging rules you implement on your side. For example:

        if article.category in ('sport','celebrity'):
            article.add_tag('exclude_from_froomle_module_y')
    • On the Froomle side, define a static filter on the dedicated label.

    • Update the tagging rules for adding the label whenever you want, without the need to involve Froomle.