How to update line with modified data in Jython?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to update line with modified data in Jython?

prabhu Mahendran
I'm having one csv which contains lakhs of rows and below is sample lines..,

1,Ni,23,28-02-2015 12:22:33.2212-02
2,Fi,21,29-02-2015 12:22:34.3212-02
3,Us,33,30-03-2015 12:23:35-01
4,Uk,34,31-03-2015 12:24:36.332211-02
I need to get the last column of csv data which is in wrong datetime format. So I need to get default datetimeformat("YYYY-MM-DD hh:mm:ss[.nnn]") from last column of the data.

I have tried the following script to get lines from it and write into flow file.

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[1:]:
        outputStream.write(line + "\n") 

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename'))
  session.transfer(flowFile, REL_SUCCESS)
but I am not able to find a way to convert it like below output.

1,Ni,23,28-02-2015 12:22:33.221
2,Fi,21,29-02-2015 12:22:34.321
3,Us,33,30-03-2015 12:23:35
4,Uk,34,31-03-2015 12:24:36.332
I have checked those requirement with my friend(google) and still not able to find solution.

Can anyone guide me to convert those input data into my required output?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to update line with modified data in Jython?

Matt Burgess-2
Prabhu,

I'm no Python/Jython master by any means, so I'm sure there's a better
way to do this than what I came up with. Along the way I noticed some
things about the input data and Jython vs Python:

1) Your "for line in text[1:]:" is skipping the first line, I assume
in the "real" data there is a header?
2) The second row of data refers to a leap day (Feb 29) which did not
exist in 2015 so it throws an exception. I changed all the months to
03 and kept going
3) Your third row doesn't have any fractional seconds, is this on
purpose? I assumed so and tried to provide for that
4) Jython (and Python 2) don't support the %z directive in datetime
formats, and %Z refers to a String like a City or Country in that
timezone or the friendly name of the timezone, not the +-HHMM value.
Also in your data you include only the hour offset, not minutes

I came up with a fairly fragile script that seems to work given your input:

import datetime
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  logger = None
  def __init__(self, log):
        logger = log
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[1:]:
        cols = line.split(",")
        df = "%d-%m-%Y %H:%M:%S.%f"
        trunc_3 = True
        try:
           d2 = datetime.datetime.strptime(cols[3][:-3],df)
        except ValueError:
           df = "%d-%m-%Y %H:%M:%S"
           trunc_3 = False
           d2 = datetime.datetime.strptime(cols[3][:-3],df)
        if trunc_3:
           cols[3] = d2.strftime(df)[:-3]
        else:
           cols[3] = d2.strftime(df)
        outputStream.write(",".join(cols) + "\n")

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback(log))
  flowFile = session.putAttribute(flowFile, "filename",
flowFile.getAttribute('filename'))
  session.transfer(flowFile, REL_SUCCESS)


Please let me know if I've misunderstood anything, and I will try to
fix/improve the script.

Regards,
Matt

On Mon, Jun 19, 2017 at 8:31 AM, prabhu Mahendran
<[hidden email]> wrote:

> I'm having one csv which contains lakhs of rows and below is sample lines..,
>
> 1,Ni,23,28-02-2015 12:22:33.2212-02
> 2,Fi,21,29-02-2015 12:22:34.3212-02
> 3,Us,33,30-03-2015 12:23:35-01
> 4,Uk,34,31-03-2015 12:24:36.332211-02
> I need to get the last column of csv data which is in wrong datetime format.
> So I need to get default datetimeformat("YYYY-MM-DD hh:mm:ss[.nnn]") from
> last column of the data.
>
> I have tried the following script to get lines from it and write into flow
> file.
>
> import json
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
>
> class PyStreamCallback(StreamCallback):
>   def __init__(self):
>         pass
>   def process(self, inputStream, outputStream):
>     text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
>     for line in text[1:]:
>         outputStream.write(line + "\n")
>
> flowFile = session.get()
> if (flowFile != None):
>   flowFile = session.write(flowFile,PyStreamCallback())
>   flowFile = session.putAttribute(flowFile, "filename",
> flowFile.getAttribute('filename'))
>   session.transfer(flowFile, REL_SUCCESS)
> but I am not able to find a way to convert it like below output.
>
> 1,Ni,23,28-02-2015 12:22:33.221
> 2,Fi,21,29-02-2015 12:22:34.321
> 3,Us,33,30-03-2015 12:23:35
> 4,Uk,34,31-03-2015 12:24:36.332
> I have checked those requirement with my friend(google) and still not able
> to find solution.
>
> Can anyone guide me to convert those input data into my required output?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to update line with modified data in Jython?

prabhu Mahendran

Thank you matt for this response

Yeah it worked😊

On 20-Jun-2017 7:55 AM, "Matt Burgess" <[hidden email]> wrote:
Prabhu,

I'm no Python/Jython master by any means, so I'm sure there's a better
way to do this than what I came up with. Along the way I noticed some
things about the input data and Jython vs Python:

1) Your "for line in text[1:]:" is skipping the first line, I assume
in the "real" data there is a header?
2) The second row of data refers to a leap day (Feb 29) which did not
exist in 2015 so it throws an exception. I changed all the months to
03 and kept going
3) Your third row doesn't have any fractional seconds, is this on
purpose? I assumed so and tried to provide for that
4) Jython (and Python 2) don't support the %z directive in datetime
formats, and %Z refers to a String like a City or Country in that
timezone or the friendly name of the timezone, not the +-HHMM value.
Also in your data you include only the hour offset, not minutes

I came up with a fairly fragile script that seems to work given your input:

import datetime
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  logger = None
  def __init__(self, log):
        logger = log
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[1:]:
        cols = line.split(",")
        df = "%d-%m-%Y %H:%M:%S.%f"
        trunc_3 = True
        try:
           d2 = datetime.datetime.strptime(cols[3][:-3],df)
        except ValueError:
           df = "%d-%m-%Y %H:%M:%S"
           trunc_3 = False
           d2 = datetime.datetime.strptime(cols[3][:-3],df)
        if trunc_3:
           cols[3] = d2.strftime(df)[:-3]
        else:
           cols[3] = d2.strftime(df)
        outputStream.write(",".join(cols) + "\n")

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback(log))
  flowFile = session.putAttribute(flowFile, "filename",
flowFile.getAttribute('filename'))
  session.transfer(flowFile, REL_SUCCESS)


Please let me know if I've misunderstood anything, and I will try to
fix/improve the script.

Regards,
Matt

On Mon, Jun 19, 2017 at 8:31 AM, prabhu Mahendran
<[hidden email]> wrote:
> I'm having one csv which contains lakhs of rows and below is sample lines..,
>
> 1,Ni,23,28-02-2015 12:22:33.2212-02
> 2,Fi,21,29-02-2015 12:22:34.3212-02
> 3,Us,33,30-03-2015 12:23:35-01
> 4,Uk,34,31-03-2015 12:24:36.332211-02
> I need to get the last column of csv data which is in wrong datetime format.
> So I need to get default datetimeformat("YYYY-MM-DD hh:mm:ss[.nnn]") from
> last column of the data.
>
> I have tried the following script to get lines from it and write into flow
> file.
>
> import json
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
>
> class PyStreamCallback(StreamCallback):
>   def __init__(self):
>         pass
>   def process(self, inputStream, outputStream):
>     text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
>     for line in text[1:]:
>         outputStream.write(line + "\n")
>
> flowFile = session.get()
> if (flowFile != None):
>   flowFile = session.write(flowFile,PyStreamCallback())
>   flowFile = session.putAttribute(flowFile, "filename",
> flowFile.getAttribute('filename'))
>   session.transfer(flowFile, REL_SUCCESS)
> but I am not able to find a way to convert it like below output.
>
> 1,Ni,23,28-02-2015 12:22:33.221
> 2,Fi,21,29-02-2015 12:22:34.321
> 3,Us,33,30-03-2015 12:23:35
> 4,Uk,34,31-03-2015 12:24:36.332
> I have checked those requirement with my friend(google) and still not able
> to find solution.
>
> Can anyone guide me to convert those input data into my required output?
Loading...