Uncompressing nested tar, tar.gz, gz, and zip files

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Uncompressing nested tar, tar.gz, gz, and zip files

James McMahon
Hello. I have incoming directories of files that contain nested numbers of tar, gz, zip, gzip, etc compressed files. The highest level arrives as a tar, but from that point forward I may or may not find results from that tar that include additional compressed files or not. My initial uncompress of the highest level tar may simply return regular files to me.

Has anyone developed a workflow to handle such indeterminate nested compressed files? My goal is to uncompress all so that I have a set of atomic files to work with.

In my current workflow I use repeated chains of IdentifyMimeType-->RouteOnAttribute-->isCompressed is true->UnpackContent
but though this works it is not practical to anticipate in such a fixed manner the number of embedded compressed files I may have to handle.

Thanks in advance for your help. -Jim
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Uncompressing nested tar, tar.gz, gz, and zip files

Mark Payne
Jim,

I would recommend not repeating chains of those processors but rather just create a loop:

IdentifyMimeType [1] -> RouteOnAttribute -> gzip ? -> CompressContent -> Back to IdentifyMimeType [1]
                                                                    -> tar or zip ? -> UnpackContent -> Back to IdentifyMimeType [1]
                                                                    -> other ? -->  [Continue on through rest of your flow]


Does that make sense?

Thanks
-Mark


> On Jun 15, 2017, at 9:48 AM, James McMahon <[hidden email]> wrote:
>
> Hello. I have incoming directories of files that contain nested numbers of tar, gz, zip, gzip, etc compressed files. The highest level arrives as a tar, but from that point forward I may or may not find results from that tar that include additional compressed files or not. My initial uncompress of the highest level tar may simply return regular files to me.
>
> Has anyone developed a workflow to handle such indeterminate nested compressed files? My goal is to uncompress all so that I have a set of atomic files to work with.
>
> In my current workflow I use repeated chains of IdentifyMimeType-->RouteOnAttribute-->isCompressed is true->UnpackContent
> but though this works it is not practical to anticipate in such a fixed manner the number of embedded compressed files I may have to handle.
>
> Thanks in advance for your help. -Jim

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Uncompressing nested tar, tar.gz, gz, and zip files

James McMahon
I think so Mark. I will try this and follow up promptly with questions if need be. Thank you once again for your help. -Jim

On Thu, Jun 15, 2017 at 9:53 AM, Mark Payne <[hidden email]> wrote:
Jim,

I would recommend not repeating chains of those processors but rather just create a loop:

IdentifyMimeType [1] -> RouteOnAttribute -> gzip ? -> CompressContent -> Back to IdentifyMimeType [1]
                                                                    -> tar or zip ? -> UnpackContent -> Back to IdentifyMimeType [1]
                                                                    -> other ? -->  [Continue on through rest of your flow]


Does that make sense?

Thanks
-Mark


> On Jun 15, 2017, at 9:48 AM, James McMahon <[hidden email]> wrote:
>
> Hello. I have incoming directories of files that contain nested numbers of tar, gz, zip, gzip, etc compressed files. The highest level arrives as a tar, but from that point forward I may or may not find results from that tar that include additional compressed files or not. My initial uncompress of the highest level tar may simply return regular files to me.
>
> Has anyone developed a workflow to handle such indeterminate nested compressed files? My goal is to uncompress all so that I have a set of atomic files to work with.
>
> In my current workflow I use repeated chains of IdentifyMimeType-->RouteOnAttribute-->isCompressed is true->UnpackContent
> but though this works it is not practical to anticipate in such a fixed manner the number of embedded compressed files I may have to handle.
>
> Thanks in advance for your help. -Jim


Loading...