I’m trying to normalize the properties
struct in the following schema into rows of property
, attribute
, and value
:
"data": {
"type": "properties",
"collection": [
{
"objectId": 23,
"category": "Things",
"externalId": "e3e052f9-0156-11d5-9301-0000863f27ad-00000017",
"properties": {
"parentKey1": {
"key1": "Text of some description",
"key2": "0.00",
"key3": "",
"key4": "None",
"key5": "",
"key6": "",
},
"parentKey2": {
"key1": "0",
"key2": "",
"key3": "",
"key4": "3Zu5Bv0LOHrPC10026FoUw"
},
"parentKey3": {
"key1": "8355711",
"key2": "No",
"key3": "Default",
"key4": "64",
"key5": "50"
},
"parentKey4": {
"key1": "Some description",
"key2": ""
}
}
}
]
}
I’m trying to transform the data into a format like:
property | attribute | value |
---|---|---|
parentKey1 | key1 | Text of some description |
parentKey1 | key2 | 0.00 |
parentKey1 | key3 | |
parentKey1 | key4 | None |
parentKey1 | key5 | |
parentKey1 | key6 | |
parentKey2 | key1 | 0 |
parentKey2 | key2 | |
parentKey2 | key3 | |
parentKey2 | key4 | 3Zu5Bv0LOHrPC10026FoUw |
parentKey3 | key1 | 8355711 |
parentKey3 | key2 | No |
parentKey3 | key3 | Default |
parentKey3 | key4 | 64 |
parentKey3 | key5 | 50 |
parentKey4 | key1 | Some description |
parentKey4 | key2 |
I’m trying to achieve this using Spark SQL, but I’d be interested in a PySpark solution as well.