HDF NIfi - Does Nifi writes provenance/data on HDP Node ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

HDF NIfi - Does Nifi writes provenance/data on HDP Node ?

Shashi Vishwakarma

Hi 

I have HDF cluster with 3 Nifi instance which lunches jobs(Hive/Spark) on HDP cluster. Usually nifi writes all information to different repositories available on local machine.

My question is - Does nifi writes any data,provenance information or does spilling on HDP nodes (ex. data nodes in HDP cluster) while accessing HDFS,Hive or Spark services ? 

Thanks

Shashi

Reply | Threaded
Open this post in threaded view
|

Re: HDF NIfi - Does Nifi writes provenance/data on HDP Node ?

Koji Kawamura
Hi Shashi,

Sorry for delayed response. I am not aware that NiFi writes any
provenance information on HDP nodes. But if your goal is to expose
NiFi provenance data to HDFS, Hive (or Spark) to analyze provenance
data using those services, then SiteToSiteProvenanceReportingTask
might be helpful.

SiteToSiteProvenanceReportingTask can sends provenance events in JSON
format. You can send it to a NiFi input port then pass those into HDFS
by PutHDFS processor.

If not, would you elaborate what you are trying to accomplish?

Thanks,
Koji

On Mon, Jun 12, 2017 at 6:25 AM, Shashi Vishwakarma
<[hidden email]> wrote:

> Hi
>
> I have HDF cluster with 3 Nifi instance which lunches jobs(Hive/Spark) on
> HDP cluster. Usually nifi writes all information to different repositories
> available on local machine.
>
> My question is - Does nifi writes any data,provenance information or does
> spilling on HDP nodes (ex. data nodes in HDP cluster) while accessing
> HDFS,Hive or Spark services ?
>
> Thanks
>
> Shashi
Reply | Threaded
Open this post in threaded view
|

Re: HDF NIfi - Does Nifi writes provenance/data on HDP Node ?

Shashi Vishwakarma
Hi Koji

I am trying to evaluate HDF NIfi from security perspective. I am trying to make sure when HDF Nifi talks to HDP , it does not leak/spill  any kind of information on HDP data nodes (i.e. on local disk). I am fine if it is writing it on HDFS.




On Thu, Jun 15, 2017 at 2:35 AM, Koji Kawamura <[hidden email]> wrote:
Hi Shashi,

Sorry for delayed response. I am not aware that NiFi writes any
provenance information on HDP nodes. But if your goal is to expose
NiFi provenance data to HDFS, Hive (or Spark) to analyze provenance
data using those services, then SiteToSiteProvenanceReportingTask
might be helpful.

SiteToSiteProvenanceReportingTask can sends provenance events in JSON
format. You can send it to a NiFi input port then pass those into HDFS
by PutHDFS processor.

If not, would you elaborate what you are trying to accomplish?

Thanks,
Koji

On Mon, Jun 12, 2017 at 6:25 AM, Shashi Vishwakarma
<[hidden email]> wrote:
> Hi
>
> I have HDF cluster with 3 Nifi instance which lunches jobs(Hive/Spark) on
> HDP cluster. Usually nifi writes all information to different repositories
> available on local machine.
>
> My question is - Does nifi writes any data,provenance information or does
> spilling on HDP nodes (ex. data nodes in HDP cluster) while accessing
> HDFS,Hive or Spark services ?
>
> Thanks
>
> Shashi

Reply | Threaded
Open this post in threaded view
|

Re: HDF NIfi - Does Nifi writes provenance/data on HDP Node ?

Bryan Bende
Hi Shashi,

This list is more about Apache NiFi and is not really specific to any
vendor distributions.

That being said, whatever node NiFi is running on, it will be using
local disk to store the internal repositories (flow file, content,
provenance).

When communicating with HDFS through the PutHDFS processor, NiFi is
reading data from it's content repository and sending it to the data
nodes of the HDFS cluster, the same as if you installed the HDFS
command line client and wrote a file to HDFS.

-Bryan


On Thu, Jun 15, 2017 at 9:51 AM, Shashi Vishwakarma
<[hidden email]> wrote:

> Hi Koji
>
> I am trying to evaluate HDF NIfi from security perspective. I am trying to
> make sure when HDF Nifi talks to HDP , it does not leak/spill  any kind of
> information on HDP data nodes (i.e. on local disk). I am fine if it is
> writing it on HDFS.
>
>
>
>
> On Thu, Jun 15, 2017 at 2:35 AM, Koji Kawamura <[hidden email]>
> wrote:
>>
>> Hi Shashi,
>>
>> Sorry for delayed response. I am not aware that NiFi writes any
>> provenance information on HDP nodes. But if your goal is to expose
>> NiFi provenance data to HDFS, Hive (or Spark) to analyze provenance
>> data using those services, then SiteToSiteProvenanceReportingTask
>> might be helpful.
>>
>> SiteToSiteProvenanceReportingTask can sends provenance events in JSON
>> format. You can send it to a NiFi input port then pass those into HDFS
>> by PutHDFS processor.
>>
>> If not, would you elaborate what you are trying to accomplish?
>>
>> Thanks,
>> Koji
>>
>> On Mon, Jun 12, 2017 at 6:25 AM, Shashi Vishwakarma
>> <[hidden email]> wrote:
>> > Hi
>> >
>> > I have HDF cluster with 3 Nifi instance which lunches jobs(Hive/Spark)
>> > on
>> > HDP cluster. Usually nifi writes all information to different
>> > repositories
>> > available on local machine.
>> >
>> > My question is - Does nifi writes any data,provenance information or
>> > does
>> > spilling on HDP nodes (ex. data nodes in HDP cluster) while accessing
>> > HDFS,Hive or Spark services ?
>> >
>> > Thanks
>> >
>> > Shashi
>
>