processors ListFile/ListSFTP do not store milliseconds in timestamp

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

processors ListFile/ListSFTP do not store milliseconds in timestamp

Roman
Hi there, i need help.

We prepare high load project and tested this processors. All time see
listing.timestamp and processed.timestamp keys without milliseconds
(xxxxxxxxxx000). In this way, if generate several files in one second, not
all files will be listened.


Test:
1. start processor ListFile/ListSFTP
2. generate 10000 zero size files. my command:  for i in {1..10000}; do
touch ./test_$i; done
3. see processor stats: out 3952 (0 bytes)


I'm somewhere wrong? Or is it a bug nifi/java/etc?

Environment

Ubuntu 14.04.5 LTS, x64, ext4 file system
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2


Thanks
Roman



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Koji Kawamura
Hello Roman,

It seems the resolution of last modified timestamp depends on the file
system implementation.
https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond

I reproduced the same behavior on OS X, which uses HFS that has the
same limitation of resolution in seconds.
https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java

Which file system are you using on your Ubuntu? If it is ext3, then
changing it to ext4 may address the issue.

Thanks,
Koji

On Thu, Jun 1, 2017 at 1:25 AM, Roman <[hidden email]> wrote:

> Hi there, i need help.
>
> We prepare high load project and tested this processors. All time see
> listing.timestamp and processed.timestamp keys without milliseconds
> (xxxxxxxxxx000). In this way, if generate several files in one second, not
> all files will be listened.
>
>
> Test:
> 1. start processor ListFile/ListSFTP
> 2. generate 10000 zero size files. my command:  for i in {1..10000}; do
> touch ./test_$i; done
> 3. see processor stats: out 3952 (0 bytes)
>
>
> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>
> Environment
>
> Ubuntu 14.04.5 LTS, x64, ext4 file system
> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>
>
> Thanks
> Roman
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Roman
Hello Koji,

Thanks for the answer. I know about it and use ext4, stat returns me right
precision - 2017-06-01 10:10:18.783447047

Thanks,
Roman



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16059.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Roman
In reply to this post by Koji Kawamura
Hi there,

During digging into this issue, I found open issue in jira  NIFI-3332
<https://issues.apache.org/jira/browse/NIFI-3332>  . Can it be related to my
situation with missed milliseconds?

Thanks
Roman


Koji Kawamura-2 wrote

> Hello Roman,
>
> It seems the resolution of last modified timestamp depends on the file
> system implementation.
> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>
> I reproduced the same behavior on OS X, which uses HFS that has the
> same limitation of resolution in seconds.
> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>
> Which file system are you using on your Ubuntu? If it is ext3, then
> changing it to ext4 may address the issue.
>
> Thanks,
> Koji
>
> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;

> ramon9869@

> &gt; wrote:
>> Hi there, i need help.
>>
>> We prepare high load project and tested this processors. All time see
>> listing.timestamp and processed.timestamp keys without milliseconds
>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>> not
>> all files will be listened.
>>
>>
>> Test:
>> 1. start processor ListFile/ListSFTP
>> 2. generate 10000 zero size files. my command:  for i in {1..10000}; do
>> touch ./test_$i; done
>> 3. see processor stats: out 3952 (0 bytes)
>>
>>
>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>
>> Environment
>>
>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>
>>
>> Thanks
>> Roman
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Koji Kawamura
Hi Roman,

I think NIFI-3332 is probably related as I can see timestamps in logs
don't have milliseconds.

I've been considering how we can support all corner cases with minimal
state to persist, and make it works even if the filesystem only
provide last modified timestamp in seconds precision.
Changing code and testing locally, but not ready for send a PR yet,
and I am not fully confident on how to fix.

Any suggestion or insight would be appreciated to make these ListXXXX
processor better.

Thanks,
Koji

On Tue, Jun 6, 2017 at 8:54 PM, Roman <[hidden email]> wrote:

> Hi there,
>
> During digging into this issue, I found open issue in jira  NIFI-3332
> <https://issues.apache.org/jira/browse/NIFI-3332>  . Can it be related to my
> situation with missed milliseconds?
>
> Thanks
> Roman
>
>
> Koji Kawamura-2 wrote
>> Hello Roman,
>>
>> It seems the resolution of last modified timestamp depends on the file
>> system implementation.
>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>
>> I reproduced the same behavior on OS X, which uses HFS that has the
>> same limitation of resolution in seconds.
>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>
>> Which file system are you using on your Ubuntu? If it is ext3, then
>> changing it to ext4 may address the issue.
>>
>> Thanks,
>> Koji
>>
>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
>
>> ramon9869@
>
>> &gt; wrote:
>>> Hi there, i need help.
>>>
>>> We prepare high load project and tested this processors. All time see
>>> listing.timestamp and processed.timestamp keys without milliseconds
>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>> not
>>> all files will be listened.
>>>
>>>
>>> Test:
>>> 1. start processor ListFile/ListSFTP
>>> 2. generate 10000 zero size files. my command:  for i in {1..10000}; do
>>> touch ./test_$i; done
>>> 3. see processor stats: out 3952 (0 bytes)
>>>
>>>
>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>
>>> Environment
>>>
>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>
>>>
>>> Thanks
>>> Roman
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Koji Kawamura
Hi Roman and all,

As I investigated further on ListFile processor, I found those are two
different issues.
Also I found another JIRA related to ListFile. Currently there seem to
be three issues:

1. ListFile can miss files with filesystems those do not provide
timestamps in milliseconds precision (NIFI-4096)
2. ListFile can miss files having the same timestamp same as the
previously processed latest timestamp (NIFI-3332)
3. ListFile can not pickup files whose timestamp is older than the
previously processed latest timestamp (NIFI-2383)

# NIFI-4096
I created JIRA NIFI-4096 to address issue#1 above, by adding
deterministic logic to detect target filesystem timestamp precision.
With NIFI-4096, ListFile can list whole 10,000 files created by the
command you shared before without missing anything:

```
for i in {1..10000}; do touch ./test_$i; done
```

The PR is ready for review. I appreciate if you can test the fix with
your use case.

Additionally, I refactored variable names in AbstractListProcessor to
explain purpose and timestamp unit better. I hope it makes the code
more readable and maintainable.

# NIFI-3332
I'm thinking about adding a processor property to specify whether
track the listed filenames with the latest processed timestamp.
Although it will be less efficient, it'd be good for some use cases.

# NIFI-2383
This is the most difficult case to handle right with only timestamp.
We need different processor which can use watch API..

Any comment would be appreciated.

Thanks,
Koji

On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura <[hidden email]> wrote:

> Hi Roman,
>
> I think NIFI-3332 is probably related as I can see timestamps in logs
> don't have milliseconds.
>
> I've been considering how we can support all corner cases with minimal
> state to persist, and make it works even if the filesystem only
> provide last modified timestamp in seconds precision.
> Changing code and testing locally, but not ready for send a PR yet,
> and I am not fully confident on how to fix.
>
> Any suggestion or insight would be appreciated to make these ListXXXX
> processor better.
>
> Thanks,
> Koji
>
> On Tue, Jun 6, 2017 at 8:54 PM, Roman <[hidden email]> wrote:
>> Hi there,
>>
>> During digging into this issue, I found open issue in jira  NIFI-3332
>> <https://issues.apache.org/jira/browse/NIFI-3332>  . Can it be related to my
>> situation with missed milliseconds?
>>
>> Thanks
>> Roman
>>
>>
>> Koji Kawamura-2 wrote
>>> Hello Roman,
>>>
>>> It seems the resolution of last modified timestamp depends on the file
>>> system implementation.
>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>
>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>> same limitation of resolution in seconds.
>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>
>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>> changing it to ext4 may address the issue.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
>>
>>> ramon9869@
>>
>>> &gt; wrote:
>>>> Hi there, i need help.
>>>>
>>>> We prepare high load project and tested this processors. All time see
>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>>> not
>>>> all files will be listened.
>>>>
>>>>
>>>> Test:
>>>> 1. start processor ListFile/ListSFTP
>>>> 2. generate 10000 zero size files. my command:  for i in {1..10000}; do
>>>> touch ./test_$i; done
>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>
>>>>
>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>
>>>> Environment
>>>>
>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>
>>>>
>>>> Thanks
>>>> Roman
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>> Nabble.com.
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Joe Skora
Koji and Roman,

Sorry to jump in here late, I meant to followup last week.

I created NIFI-3332 because Issue #2, when ListFile fires while between OS
writes of a batch of files, files with the same timestamp that the OS
writes after the processor fired are missed.  I suspect #1 is an is an
amplification of #2 where the second resolution will unfortunately increase
both the potential collision rate and potential state to be tracked each
1,000 fold.

I have a harder time with #3 as I understand the opinion that it's a new
file if I just wrote it, even if I kept the old timestamp.  But NiFi has to
use a discrete means to identify new files and I think it is reasonable to
use file timestamps, especially since this scenario can be mitigated by
updating the file timestamp.  It could be possible to use a combination of
modification and creation times (where both are available) to minimize
potential misses, but I don't think #3 is as likely as #1 and 2 once the
logic is understood, especially since a workaround is fairly easy.

I think a ListXXX processor that tracks events from Linux iNotify and/or
Windows FileSystemWatcher (or something similar) services would be a great
addition, but the simplicity of ListFile would still be useful if I could
trust it to not silently drop files.

I hope that helps.

Regards,
Joe

On Wed, Jun 14, 2017 at 5:00 AM, Koji Kawamura <[hidden email]>
wrote:

> Hi Roman and all,
>
> As I investigated further on ListFile processor, I found those are two
> different issues.
> Also I found another JIRA related to ListFile. Currently there seem to
> be three issues:
>
> 1. ListFile can miss files with filesystems those do not provide
> timestamps in milliseconds precision (NIFI-4096)
> 2. ListFile can miss files having the same timestamp same as the
> previously processed latest timestamp (NIFI-3332)
> 3. ListFile can not pickup files whose timestamp is older than the
> previously processed latest timestamp (NIFI-2383)
>
> # NIFI-4096
> I created JIRA NIFI-4096 to address issue#1 above, by adding
> deterministic logic to detect target filesystem timestamp precision.
> With NIFI-4096, ListFile can list whole 10,000 files created by the
> command you shared before without missing anything:
>
> ```
> for i in {1..10000}; do touch ./test_$i; done
> ```
>
> The PR is ready for review. I appreciate if you can test the fix with
> your use case.
>
> Additionally, I refactored variable names in AbstractListProcessor to
> explain purpose and timestamp unit better. I hope it makes the code
> more readable and maintainable.
>
> # NIFI-3332
> I'm thinking about adding a processor property to specify whether
> track the listed filenames with the latest processed timestamp.
> Although it will be less efficient, it'd be good for some use cases.
>
> # NIFI-2383
> This is the most difficult case to handle right with only timestamp.
> We need different processor which can use watch API..
>
> Any comment would be appreciated.
>
> Thanks,
> Koji
>
> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura <[hidden email]>
> wrote:
> > Hi Roman,
> >
> > I think NIFI-3332 is probably related as I can see timestamps in logs
> > don't have milliseconds.
> >
> > I've been considering how we can support all corner cases with minimal
> > state to persist, and make it works even if the filesystem only
> > provide last modified timestamp in seconds precision.
> > Changing code and testing locally, but not ready for send a PR yet,
> > and I am not fully confident on how to fix.
> >
> > Any suggestion or insight would be appreciated to make these ListXXXX
> > processor better.
> >
> > Thanks,
> > Koji
> >
> > On Tue, Jun 6, 2017 at 8:54 PM, Roman <[hidden email]> wrote:
> >> Hi there,
> >>
> >> During digging into this issue, I found open issue in jira  NIFI-3332
> >> <https://issues.apache.org/jira/browse/NIFI-3332>  . Can it be related
> to my
> >> situation with missed milliseconds?
> >>
> >> Thanks
> >> Roman
> >>
> >>
> >> Koji Kawamura-2 wrote
> >>> Hello Roman,
> >>>
> >>> It seems the resolution of last modified timestamp depends on the file
> >>> system implementation.
> >>> https://stackoverflow.com/questions/3805201/how-to-get-
> ubuntu-file-timestamp-in-millisecond
> >>>
> >>> I reproduced the same behavior on OS X, which uses HFS that has the
> >>> same limitation of resolution in seconds.
> >>> https://stackoverflow.com/questions/18403588/how-to-
> return-millisecond-information-for-file-access-on-mac-os-x-in-java
> >>>
> >>> Which file system are you using on your Ubuntu? If it is ext3, then
> >>> changing it to ext4 may address the issue.
> >>>
> >>> Thanks,
> >>> Koji
> >>>
> >>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
> >>
> >>> ramon9869@
> >>
> >>> &gt; wrote:
> >>>> Hi there, i need help.
> >>>>
> >>>> We prepare high load project and tested this processors. All time see
> >>>> listing.timestamp and processed.timestamp keys without milliseconds
> >>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
> >>>> not
> >>>> all files will be listened.
> >>>>
> >>>>
> >>>> Test:
> >>>> 1. start processor ListFile/ListSFTP
> >>>> 2. generate 10000 zero size files. my command:  for i in {1..10000};
> do
> >>>> touch ./test_$i; done
> >>>> 3. see processor stats: out 3952 (0 bytes)
> >>>>
> >>>>
> >>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
> >>>>
> >>>> Environment
> >>>>
> >>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
> >>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
> >>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
> >>>>
> >>>>
> >>>> Thanks
> >>>> Roman
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://apache-nifi-developer-list.39713.n7.nabble.com/
> processors-ListFile-ListSFTP-do-not-store-milliseconds-in-
> timestamp-tp16037.html
> >>>> Sent from the Apache NiFi Developer List mailing list archive at
> >>>> Nabble.com.
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/processors-ListFile-ListSFTP-
> do-not-store-milliseconds-in-timestamp-tp16037p16118.html
> >> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Roman
In reply to this post by Koji Kawamura
Hello Koji,

Thanks for NIFI-4069 (not NIFI-4096 =))

I tested your PR in several ways on version: From a0f2834 on branch
nifi-4069

Test 1:
1. set Target System Timestamp Precision: Auto Detect
2. start ListFile
3. start script for i in {1..10000}; do touch ./test_$i; done

Result: no miss files


Test 2:
1. set Target System Timestamp Precision: Milliseconds
2. start ListFile
3. start script for i in {1..10000}; do touch ./test_$i; done

Result: there are missing files


Test 3 and 4 (100k files):
1. set Target System Timestamp Precision: Auto Detect
2. start ListFile
3. start script for i in {1..100000}; do touch ./test_$i; done

Result: missing 68 and 40 files


In all tests listing.timestamp and processed.timestamp still not have
milliseconds



Summary:
1. Now much better than it was. Thanks Koji for good job!
2. Still do not see milliseconds, however my ext4 file system show modify
date in nanoseconds


Koji Kawamura-2 wrote

> Hi Roman and all,
>
> As I investigated further on ListFile processor, I found those are two
> different issues.
> Also I found another JIRA related to ListFile. Currently there seem to
> be three issues:
>
> 1. ListFile can miss files with filesystems those do not provide
> timestamps in milliseconds precision (NIFI-4096)
> 2. ListFile can miss files having the same timestamp same as the
> previously processed latest timestamp (NIFI-3332)
> 3. ListFile can not pickup files whose timestamp is older than the
> previously processed latest timestamp (NIFI-2383)
>
> # NIFI-4096
> I created JIRA NIFI-4096 to address issue#1 above, by adding
> deterministic logic to detect target filesystem timestamp precision.
> With NIFI-4096, ListFile can list whole 10,000 files created by the
> command you shared before without missing anything:
>
> ```
> for i in {1..10000}; do touch ./test_$i; done
> ```
>
> The PR is ready for review. I appreciate if you can test the fix with
> your use case.
>
> Additionally, I refactored variable names in AbstractListProcessor to
> explain purpose and timestamp unit better. I hope it makes the code
> more readable and maintainable.
>
> # NIFI-3332
> I'm thinking about adding a processor property to specify whether
> track the listed filenames with the latest processed timestamp.
> Although it will be less efficient, it'd be good for some use cases.
>
> # NIFI-2383
> This is the most difficult case to handle right with only timestamp.
> We need different processor which can use watch API..
>
> Any comment would be appreciated.
>
> Thanks,
> Koji
>
> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura &lt;

> ijokarumawak@

> &gt; wrote:
>> Hi Roman,
>>
>> I think NIFI-3332 is probably related as I can see timestamps in logs
>> don't have milliseconds.
>>
>> I've been considering how we can support all corner cases with minimal
>> state to persist, and make it works even if the filesystem only
>> provide last modified timestamp in seconds precision.
>> Changing code and testing locally, but not ready for send a PR yet,
>> and I am not fully confident on how to fix.
>>
>> Any suggestion or insight would be appreciated to make these ListXXXX
>> processor better.
>>
>> Thanks,
>> Koji
>>
>> On Tue, Jun 6, 2017 at 8:54 PM, Roman &lt;

> ramon9869@

> &gt; wrote:
>>> Hi there,
>>>
>>> During digging into this issue, I found open issue in jira  NIFI-3332
>>> &lt;https://issues.apache.org/jira/browse/NIFI-3332&gt;  . Can it be
>>> related to my
>>> situation with missed milliseconds?
>>>
>>> Thanks
>>> Roman
>>>
>>>
>>> Koji Kawamura-2 wrote
>>>> Hello Roman,
>>>>
>>>> It seems the resolution of last modified timestamp depends on the file
>>>> system implementation.
>>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>>
>>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>>> same limitation of resolution in seconds.
>>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>>
>>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>>> changing it to ext4 may address the issue.
>>>>
>>>> Thanks,
>>>> Koji
>>>>
>>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
>>>
>>>> ramon9869@
>>>
>>>> &gt; wrote:
>>>>> Hi there, i need help.
>>>>>
>>>>> We prepare high load project and tested this processors. All time see
>>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>>>> not
>>>>> all files will be listened.
>>>>>
>>>>>
>>>>> Test:
>>>>> 1. start processor ListFile/ListSFTP
>>>>> 2. generate 10000 zero size files. my command:  for i in {1..10000};
>>>>> do
>>>>> touch ./test_$i; done
>>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>>
>>>>>
>>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>>
>>>>> Environment
>>>>>
>>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>>
>>>>>
>>>>> Thanks
>>>>> Roman
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>> Nabble.com.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16221.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: processors ListFile/ListSFTP do not store milliseconds in timestamp

Koji Kawamura
Thanks Joe, I agree with you on the idea to make ListXXX as reliable
as possible. If it's done, I'm also interested in providing different
means using watch APIs to cover use-cases that ListXXX can't (by
timestamps).

Roman, thanks for testing the change.
Test 1 and 2 results are expected.
Test 3 ... this might have been affected by the issue reported by
NIFI-3332 (files having the same timestamp processed at previous
cycle). I'll take a look if there's anything we can do.

> 2. Still do not see milliseconds, however my ext4 file system show modify date in nanoseconds

Roman, would you try creating a simple Java program to see if the
issue resides in NiFi codebase, or native code for your environment?
There is a similar issue reported in Stackoverflow:
https://stackoverflow.com/questions/24804618/get-file-mtime-with-millisecond-resolution-from-java

If the simple program can return timestamp in milliseconds, we should
fix something in NiFi.

I really appreciate your feedback! Thanks!
Koji

On Tue, Jun 20, 2017 at 9:17 PM, Roman <[hidden email]> wrote:

> Hello Koji,
>
> Thanks for NIFI-4069 (not NIFI-4096 =))
>
> I tested your PR in several ways on version: From a0f2834 on branch
> nifi-4069
>
> Test 1:
> 1. set Target System Timestamp Precision: Auto Detect
> 2. start ListFile
> 3. start script for i in {1..10000}; do touch ./test_$i; done
>
> Result: no miss files
>
>
> Test 2:
> 1. set Target System Timestamp Precision: Milliseconds
> 2. start ListFile
> 3. start script for i in {1..10000}; do touch ./test_$i; done
>
> Result: there are missing files
>
>
> Test 3 and 4 (100k files):
> 1. set Target System Timestamp Precision: Auto Detect
> 2. start ListFile
> 3. start script for i in {1..100000}; do touch ./test_$i; done
>
> Result: missing 68 and 40 files
>
>
> In all tests listing.timestamp and processed.timestamp still not have
> milliseconds
>
>
>
> Summary:
> 1. Now much better than it was. Thanks Koji for good job!
> 2. Still do not see milliseconds, however my ext4 file system show modify
> date in nanoseconds
>
>
> Koji Kawamura-2 wrote
>> Hi Roman and all,
>>
>> As I investigated further on ListFile processor, I found those are two
>> different issues.
>> Also I found another JIRA related to ListFile. Currently there seem to
>> be three issues:
>>
>> 1. ListFile can miss files with filesystems those do not provide
>> timestamps in milliseconds precision (NIFI-4096)
>> 2. ListFile can miss files having the same timestamp same as the
>> previously processed latest timestamp (NIFI-3332)
>> 3. ListFile can not pickup files whose timestamp is older than the
>> previously processed latest timestamp (NIFI-2383)
>>
>> # NIFI-4096
>> I created JIRA NIFI-4096 to address issue#1 above, by adding
>> deterministic logic to detect target filesystem timestamp precision.
>> With NIFI-4096, ListFile can list whole 10,000 files created by the
>> command you shared before without missing anything:
>>
>> ```
>> for i in {1..10000}; do touch ./test_$i; done
>> ```
>>
>> The PR is ready for review. I appreciate if you can test the fix with
>> your use case.
>>
>> Additionally, I refactored variable names in AbstractListProcessor to
>> explain purpose and timestamp unit better. I hope it makes the code
>> more readable and maintainable.
>>
>> # NIFI-3332
>> I'm thinking about adding a processor property to specify whether
>> track the listed filenames with the latest processed timestamp.
>> Although it will be less efficient, it'd be good for some use cases.
>>
>> # NIFI-2383
>> This is the most difficult case to handle right with only timestamp.
>> We need different processor which can use watch API..
>>
>> Any comment would be appreciated.
>>
>> Thanks,
>> Koji
>>
>> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura &lt;
>
>> ijokarumawak@
>
>> &gt; wrote:
>>> Hi Roman,
>>>
>>> I think NIFI-3332 is probably related as I can see timestamps in logs
>>> don't have milliseconds.
>>>
>>> I've been considering how we can support all corner cases with minimal
>>> state to persist, and make it works even if the filesystem only
>>> provide last modified timestamp in seconds precision.
>>> Changing code and testing locally, but not ready for send a PR yet,
>>> and I am not fully confident on how to fix.
>>>
>>> Any suggestion or insight would be appreciated to make these ListXXXX
>>> processor better.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Tue, Jun 6, 2017 at 8:54 PM, Roman &lt;
>
>> ramon9869@
>
>> &gt; wrote:
>>>> Hi there,
>>>>
>>>> During digging into this issue, I found open issue in jira  NIFI-3332
>>>> &lt;https://issues.apache.org/jira/browse/NIFI-3332&gt;  . Can it be
>>>> related to my
>>>> situation with missed milliseconds?
>>>>
>>>> Thanks
>>>> Roman
>>>>
>>>>
>>>> Koji Kawamura-2 wrote
>>>>> Hello Roman,
>>>>>
>>>>> It seems the resolution of last modified timestamp depends on the file
>>>>> system implementation.
>>>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>>>
>>>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>>>> same limitation of resolution in seconds.
>>>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>>>
>>>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>>>> changing it to ext4 may address the issue.
>>>>>
>>>>> Thanks,
>>>>> Koji
>>>>>
>>>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
>>>>
>>>>> ramon9869@
>>>>
>>>>> &gt; wrote:
>>>>>> Hi there, i need help.
>>>>>>
>>>>>> We prepare high load project and tested this processors. All time see
>>>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>>>>> not
>>>>>> all files will be listened.
>>>>>>
>>>>>>
>>>>>> Test:
>>>>>> 1. start processor ListFile/ListSFTP
>>>>>> 2. generate 10000 zero size files. my command:  for i in {1..10000};
>>>>>> do
>>>>>> touch ./test_$i; done
>>>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>>>
>>>>>>
>>>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>>>
>>>>>> Environment
>>>>>>
>>>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Roman
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>>> Nabble.com.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>> Nabble.com.
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16221.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Loading...