U-SQL – Extracting Information From a Column Containing JSON

There are plenty of examples on how to parse JSON files using U-SQL, but what if one of the columns of a TSV contains JSON like this example:

ID  Name  Vehicle 
1   Joe   "{""Year"":""2017"",""Make"":""Ford"",""Model"":""Fusion""}"
2   Bob   "{""Year"":""2016"",""Make"":""Kia"",""Model"":""Sorento""}"

We can adapt an example of Using the JSON User Defined Functions from the Microsoft.Analytics.Formatting.Samples.Formats library in the U-SQL Examples repository on GitHub.

Once we’ve built and registered the assemblies, the script is rather straightforward.

First up, we need to reference the assemblies we just registered.

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

Next, we’ll extract the raw data

@rawData =
    EXTRACT Id int,
            Name string,
            JsonString string
    FROM "/file_with_json_column.tsv"
    USING Extractors.Tsv(skipFirstNRows:1);

In this next bit, we’ll use the JsonTuple function from the Microsoft.Analytics.Samples.Formats library convert the JSON string into a dictionary. Once we have the dictionary, we’ll pull values out of it.

@rawDataWithTuple =
    SELECT Id,
           Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(JsonString) AS JsonTuple
    FROM @rawData;

@expandedData =
    SELECT Id,
           JsonTuple["Year"] AS VehicleYear,
           JsonTuple["Make"] AS VehicleMake,
           JsonTuple["Model"] AS VehicleModel
    FROM @rawDataWithTuple;

Finally, we’ll output the expanded data so we can check our work.

OUTPUT @expandedData
TO "/expandedData.tsv"
USING Outputters.Tsv(outputHeader : true);

The full script and a sample file can be found here.

This entry was posted in Azure, Big Data, JSON, U-SQL. Bookmark the permalink.

2 Responses to U-SQL – Extracting Information From a Column Containing JSON

  1. Paul says:

    For most cases I assume you could use OpenJSON and Json_value? This is for when you’re already using U-SQL?

Leave a Reply

Your email address will not be published. Required fields are marked *