How to Process Large Flat Files as a Source Schema
EBI sometimes has issues processing large files. Below is how we can build a special process to handle these situations.
I received a java program from EXTOL Support called "FlatFileDecomp".
This program will not only break the one large file into smaller files that EBI can handle, but will also eliminate records that we do not want to process. For example, we took a 852MB flat file and broke it down into 8000 record files. We eliminated all records except for the DTL records. The output was 103 files that totaled 221MB. So, that alone will help as far as the load of data that we passed into to EXTOL Business Integrator.
Here are the program specs:
The program has four input parameters:
- Input File
- Number of records for each new flat file
- Output folder
- Formats to omit or preserve (-o = omit, -p = preserve)
EXAMPLE 1:
- Input File = C:\parse\in.txt
- Number of records for each new flat file = 8000
- Output folder = C:\parse\
- Formats to omit or preserve = omit AUD,HDR,LIN,NAM,SUM
C:\parse>java -cp ./FlatFileDecomp.jar FlatFileDecomp C:\parse\in.txt 8000 C:\parse\ -oAUD,HDR,LIN,NAM,SUM
We should see this output in the command console…
Starting FlatFileDecomp:
- Input: C:\parse\in.txt
- Records: 8000
- Output: C:\parse\
- Ommitting: AUD,HDR,LIN,NAM,SUM
Read: 1628565 record(s).
Ignored: 812375 record(s).
Wrote: 816190 record(s).
EXAMPLE 2:
- Input File = C:\parse\in.txt
- Number of records for each new flat file = 8000
- Output folder = C:\parse\
- Formats to omit or preserve = preserve DTL
- C:\parse>java -cp ./FlatFileDecomp.jar FlatFileDecomp C:\parse\in.txt 8000 C:\parse\ -pDTL
Starting FlatFileDecomp:
- Input: C:\parse\in.txt
- Records: 8000
- Output: C:\parse\
- Preserving : DTL
Read: 1628565 record(s).
Ignored: 812375 record(s).
Wrote: 816190 record(s).
NOTES:
Be careful because the program is currently configured to output files to wherever you specify, but the naming convention uses the input file name appended with a number. For example, if our input file is in.txt, your output files will be: in.txt_1, in.txt_2, in.txt_3, etc. However, if those files already exist, they will be overwritten. So, you will need to figure out your timing; how you will feed files to the system; and where you will output them to each time and then move them into EBI.
The java program can currently be called from a Windows command prompt for testing purposes. We can build a batch file program to feed files to it also. It is possible to create batch file programs that pass parameters into the java program so that it is flexible. However, some programming within the Windows OS will be required on your side to make that happen.
NOTE: If you are unable to obtain this java program, please contact me.
By: Sean Hoppe on