Converting between avro formats

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Converting between avro formats

Mika Borner
Hi

I'm struggling with a record conversion. Not sure if it's my lack of
knowledge about AVRO or if this even possible with the ConvertRecord
processor.

My input record schema is:

{
   "name": "inputrecord",
   "namespace": "event",
   "type": "record",
   "fields": [
     { "name": "foo", "type": "string" },
     { "name": "bar", "type": "string" },
     { "name": "baz", "type": "string" }
   ]
}

My output record schema:

{
   "name": "outputrecord",
   "namespace": "event",
   "type": "record",
   "fields": [
     { "name": "foo", "type": "string" },
     { "name": "bar", "type": "string" },
     { "name": "attributes",
               "type": { "type": "record",
                 "name": "attributes",
                 "namespace": "event",
                 "fields" : [
                            {"name": "baz", "type": "string"}
                            ]
               }
     }
   ]
}

So basically, I want to have baz as a child attribute under the
"attributes" element.

Currently I'm only getting "null" values back. If I use the inputrecord
as the output schema, I'm getting a value back for baz.

Thanks for your help!

Mika>



Reply | Threaded
Open this post in threaded view
|

Re: Converting between avro formats

Wes Lawrence
You may have to use 1.3's 'UpdateRecord'.

You can set the 'Replacement Value Strategy' to 'Record Path Value', and add a custom property of key '/attributes/baz' with a value of '/baz'.

That should convert from the input to output schemas.

--Wes

On Wed, Jun 14, 2017 at 2:57 PM, Mika Borner <[hidden email]> wrote:
Hi

I'm struggling with a record conversion. Not sure if it's my lack of knowledge about AVRO or if this even possible with the ConvertRecord processor.

My input record schema is:

{
  "name": "inputrecord",
  "namespace": "event",
  "type": "record",
  "fields": [
    { "name": "foo", "type": "string" },
    { "name": "bar", "type": "string" },
    { "name": "baz", "type": "string" }
  ]
}

My output record schema:

{
  "name": "outputrecord",
  "namespace": "event",
  "type": "record",
  "fields": [
    { "name": "foo", "type": "string" },
    { "name": "bar", "type": "string" },
    { "name": "attributes",
              "type": { "type": "record",
                "name": "attributes",
                "namespace": "event",
                "fields" : [
                           {"name": "baz", "type": "string"}
                           ]
              }
    }
  ]
}

So basically, I want to have baz as a child attribute under the "attributes" element.

Currently I'm only getting "null" values back. If I use the inputrecord as the output schema, I'm getting a value back for baz.

Thanks for your help!

Mika>




Reply | Threaded
Open this post in threaded view
|

Re: Converting between avro formats

Mark Payne
Mika,

ConvertRecord won't work for this type of thing, as what you're really wanting to do is creating some sort of
mapping from one schema to a completely different schema. ConvertRecord would have no way of knowing that
it should create some intermediate 'attributes' Record and push the 'baz' field inside of it. This processor is capable
of translating between 'like' schemas, such as converting the baz field from a String to an Integer. But it's not able
to handle something like this were the schemas are not compatible.

So my first thought was to use UpdateRecord, exactly as Wes laid out here. That's not actually going to work, either,
though, because the 'attributes' Record doesn't exist. If there were an 'attributes' Record that did not have a value
for 'baz' then it would work okay, but it won't create that intermediate Record for you.

Unfortunately, one of the downsides to using Avro is that there just isn't much tooling around it like there is around
a lot of other (older) data formats. That said, I would like to create a JoltTransformRecord processor that would allow
you to use Jolt to do this sort of transformation. But I've not gotten to that yet. One possibility, though, would be to use
ConvertRecord to convert your Avro data into JSON data, then use JoltTransform to transform between the two schemas,
and then again use ConvertRecord to convert from JSON back to Avro.

It is a bit tedious, but may be your best bet for now.

Thanks
-Mark


On Jun 14, 2017, at 4:05 PM, Wes Lawrence <[hidden email]> wrote:

You may have to use 1.3's 'UpdateRecord'.

You can set the 'Replacement Value Strategy' to 'Record Path Value', and add a custom property of key '/attributes/baz' with a value of '/baz'.

That should convert from the input to output schemas.

--Wes

On Wed, Jun 14, 2017 at 2:57 PM, Mika Borner <[hidden email]> wrote:
Hi

I'm struggling with a record conversion. Not sure if it's my lack of knowledge about AVRO or if this even possible with the ConvertRecord processor.

My input record schema is:

{
  "name": "inputrecord",
  "namespace": "event",
  "type": "record",
  "fields": [
    { "name": "foo", "type": "string" },
    { "name": "bar", "type": "string" },
    { "name": "baz", "type": "string" }
  ]
}

My output record schema:

{
  "name": "outputrecord",
  "namespace": "event",
  "type": "record",
  "fields": [
    { "name": "foo", "type": "string" },
    { "name": "bar", "type": "string" },
    { "name": "attributes",
              "type": { "type": "record",
                "name": "attributes",
                "namespace": "event",
                "fields" : [
                           {"name": "baz", "type": "string"}
                           ]
              }
    }
  ]
}

So basically, I want to have baz as a child attribute under the "attributes" element.

Currently I'm only getting "null" values back. If I use the inputrecord as the output schema, I'm getting a value back for baz.

Thanks for your help!

Mika>





Reply | Threaded
Open this post in threaded view
|

Re: Converting between avro formats

Mika Borner

Yes, tried it the way Wes said without success.

In my case I'm writing out json anyways, so I will take another route to transform the data.

Thanks for clarifying.

Mika>


On 06/15/2017 03:09 PM, Mark Payne wrote:
Mika,

ConvertRecord won't work for this type of thing, as what you're really wanting to do is creating some sort of
mapping from one schema to a completely different schema. ConvertRecord would have no way of knowing that
it should create some intermediate 'attributes' Record and push the 'baz' field inside of it. This processor is capable
of translating between 'like' schemas, such as converting the baz field from a String to an Integer. But it's not able
to handle something like this were the schemas are not compatible.

So my first thought was to use UpdateRecord, exactly as Wes laid out here. That's not actually going to work, either,
though, because the 'attributes' Record doesn't exist. If there were an 'attributes' Record that did not have a value
for 'baz' then it would work okay, but it won't create that intermediate Record for you.

Unfortunately, one of the downsides to using Avro is that there just isn't much tooling around it like there is around
a lot of other (older) data formats. That said, I would like to create a JoltTransformRecord processor that would allow
you to use Jolt to do this sort of transformation. But I've not gotten to that yet. One possibility, though, would be to use
ConvertRecord to convert your Avro data into JSON data, then use JoltTransform to transform between the two schemas,
and then again use ConvertRecord to convert from JSON back to Avro.

It is a bit tedious, but may be your best bet for now.

Thanks
-Mark


On Jun 14, 2017, at 4:05 PM, Wes Lawrence <[hidden email]> wrote:

You may have to use 1.3's 'UpdateRecord'.

You can set the 'Replacement Value Strategy' to 'Record Path Value', and add a custom property of key '/attributes/baz' with a value of '/baz'.

That should convert from the input to output schemas.

--Wes

On Wed, Jun 14, 2017 at 2:57 PM, Mika Borner <[hidden email]> wrote:
Hi

I'm struggling with a record conversion. Not sure if it's my lack of knowledge about AVRO or if this even possible with the ConvertRecord processor.

My input record schema is:

{
  "name": "inputrecord",
  "namespace": "event",
  "type": "record",
  "fields": [
    { "name": "foo", "type": "string" },
    { "name": "bar", "type": "string" },
    { "name": "baz", "type": "string" }
  ]
}

My output record schema:

{
  "name": "outputrecord",
  "namespace": "event",
  "type": "record",
  "fields": [
    { "name": "foo", "type": "string" },
    { "name": "bar", "type": "string" },
    { "name": "attributes",
              "type": { "type": "record",
                "name": "attributes",
                "namespace": "event",
                "fields" : [
                           {"name": "baz", "type": "string"}
                           ]
              }
    }
  ]
}

So basically, I want to have baz as a child attribute under the "attributes" element.

Currently I'm only getting "null" values back. If I use the inputrecord as the output schema, I'm getting a value back for baz.

Thanks for your help!

Mika>