MMCC MASTHEAD
Mid-Michigan Computer Consultants - Bay City, Michigan
 


CONTENTS       (old style)
Mid-Michigan Computer Consultants
509 Center
Bay City, Michigan

Sales (989) 892-9242
Support (989) 686-8860

Plb-0312.cfm v1.0


plb-t010.cfm
 

ANSI Standard PL/B Language and Visual PL/B

File READ Notes


This section discusses various file reading techniques. Topics on this page include:
  • READ Tips and techniques (link to PLB-0312)


READING FILES:

Reading sequential files is very simple. Just code:
      READ {filename},{mode};{data elements}
      READ TESTFILE,SEQ;NAME, ADDRESS, CITY


After the read you just check OVER to see if the end of file was reached.
taken from the Sunbelt technical Support web board:
Subject: Partial read of a file
#19765
4/23/2012

I have a tab delimited text file that a client downloads from another vendor. The latest file reads about 450 records out of a few thousand then gets an OVER condition.

I can read the entire file in NOTEPAD and in EXCEL without a problem. The "TYPE" command line utility stops at the same place that PL/B does.

My main program reads using *EDION. I've written a tiny test program that reads as just a flat file. Both stop at the same place.

Using a home grown PL/B utility that reads byte at a time with ABSON, I can go right through the stopper. I see nothing unusual about the data at that point.

I read the file in NOTEPAD and saved to another file. PL/B still stops at the same point, although NOTEPAD shows the entire thing.

Any ideas?
Stephen Kent


re: No end-of- file character (0x1A) Hiding at that point of the file?

I suspect the same thing or some other non-ascii character.

I just ran into a similar issue where one of our produciton Text files did not display correctly in my View program. I suspected a non-ascii character was the cause and the the PLB Utility Suite showed there was was extract Carriage Returns in the file. (Thanks Robert and Lee! http://visualplb.com/plbutilitysuite/download/)

I then opened the file in our Text editor (SlickEdit) and did a a search on the Carriage Return character.

In your case, I would be curious if PLB Utility Suite would stop at the same spot your program stops of if it will read all the way through.

Gerhard Weiss


Stephen sent us the file and we analyzed it. There was a 0x1A character in the 452nd record. Rewrote the test program to show how to use ABSON to read the entire file in 1 read, then REPLACE to get rid of the 0x1A and 0x0D characters, then process the file using the EXPLODE statement with 0x0A as the delimiter. Processes all records now.

Steve White, Sunbelt


Thanks for sharing the solution. I always like these clever ideas on how to fix a file. I could also see how the SQUEEZE could be used to get rid of the 0x1A and 0x0D characters.

Gerhard Weiss


Thanks to Steve White's quick response we know the problem: a hex 1A in a person's address in the FTP file we received. Who knows how it got there. Now that we can identify the account, the vendor on the other end can clean it up. It must have snuck in recently because previous downloads didn't have it but it continued to come to us when we asked for new downloads.

I didn't notice the 1A in my hex dump since I wasn't looking for it. In my old "green card" and other hex charts the description is blank or "SUB". The EOF meaning must be a MicroSoft consideration.

Sidetrack: Did you ever notice how hex tables have dissappeared? I got on a ladder to look at my old DOS books, which have a thick layer of dust on them. Only a few had any hex/ASCII tables and none had the 1A, other than as SUB. Fortunately, an internet search uncovered a number of 1A references.

Stephen Kent


This doesn't explain the 0x1A as end-of-file either, but I use this
ASCII table.

There is also the "Translation Table" found in the PL/B runtime Reference.

Stuart Elliott


1. The 0x1A character as an EOF is a hold over from the DOS Operating Systems.

2. Steven was seeing the 'SUB' being associated with the 0x1A. Here is a link that you can start to get more information about the 0x1A as a EOF. www.wikipedia.org/wiki/Substitute_character

Ed Boedecker, Sunbelt


This technique is not just for fixing a file. Think of the simple task that we all do in many programs of reading a file sequentially from start to finish. PL/B has a default buffer for sequential files of 256 bytes. If you have a 100k file, then it takes 400 physical reads to read the entire file. With this technique you can read in the entire file in ONE physical read and then all processing is done internally and the program never has to hit the disk again for that file. Just think of the speed improvements that could be made to many programs. Here is a sample template for such a process:

Steve White, Sunbelt
.--------------------------------------------------------
.
File     FILE
FileName DIM       50
Size     FORM      10
FileData DIM       ^
Zero     FORM      "0"
CR_Blank INIT      0x0D," ",0x1A," "
LF       INIT      0x0A
EOF      FORM      1
.--------------------------------------------------------
. Open file, get file size and create data buffer.
.
        OPEN File,FileName
        POSITEOF File
        FPOSIT File,Size
        SMAKE FileData,Size
.--------------------------------------------------------
. Read entire file into working buffer. Start reading at ZERO
.
        READ File,ZERO;*ABSON,FileData,*ABSOFF
        CLOSE File
.--------------------------------------------------------
. Replace Carriage Return and EOF marks with blanks
.
        REPLACE CR_Blank,FileData
        LOOP
.--------------------------------------------------------
. Move all data up to next Line Feed to record buffer.  If End of
. String was encountered while moving data, the ZERO flag is set
.
. If set, then set a flag that EOF was found since we still have a
. good record to handle.
.
        EXPLODE FileData,LF,FileRecord
        IF ZERO
            MOVE "1",EOF
        ENDIF
.--------------------------------------------------------
. Process the data here.  Do whatever needs to be done.
.
         .....
.--------------------------------------------------------
. Continue with loop until the EOF flag is set.
.
        REPEAT UNTIL ( EOF = "1" )



Most of our file access it done using ISAM files but there is one spot where I can use this. It is a fix length file so I will not need to do a

REPLACE CR_Blank,FileData

I will probably also replace the SMAKE with DMAKE/DFREE logic. SMAKE allows for 32mb DIM variables, where DMAKE allows for 2GB. Of course, I am sure if I read a 2GB file into memory it will kill the performace of my system.

The other think is I am using SUNSORT to create this file so there is a good chance it is already disk cached.

Gerhard Weiss


If you're going to use EXPLODE to read through the "file" then you'll need a single delimiter and, in Steve's example, LF is it. So he got rid of the CR's because they became superfluous.

If you're going to handle the Form Pointer yourself and not use EXPLODE then you can do whatever you want.

But I think EXPLODE in the LOOP/REPEAT is one of the best features of this technique. The "bestest" feature, of course, is doing it all in RAM.

I don't remember using the POSITEOF/FPOSIT technique; I use FINDFILE to get the size of the file to DMAKE the variable. I suppose one is more efficient than the other.

--Stuart Elliott


I was thinking of a Fix Length file where the Carriage return would not be read into the variable because it was smaller. i.e. in the example below lowercase 'c' is CR and lowercase 'l' is LF. The record is 10 bytes long so the variable being read into is a DIM10. The EXPLODE will transfer 11 bytes with the CR but only the first 10 are placed in the variable.
EXP1REC  DIM       10
EXP1DATA INIT      "1234567890cl":
                   "ABCDEFGHIJcl":
                   "1234567890cl":
                   "ABCDEFGHIJcl"
.
EOFSW    FORM      1
.
        LOOP
            EXPLODE   EXP1DATA,"l",EXP1REC
            IF        ZERO
               MOVE      "1",EOFSW
            ENDIF
            DISPLAY   EXP1REC,"<"
        REPEAT    UNTIL (EOFSW=1)
Steve was working with a Variable Length record that had tab seperated fields. This would need two EXPLODEs. One for the record and one for the fields in the record. Even there, instead of replacing the CR with a space you could use the CR as a deliminiter in the second EXPLODE. Doing this would not add a space to the last field.

Here is an example of varable length record with comma seperated fields. Notice the second EXPLODE has a delimiter of Comma and 'c'

EXP2REC  DIM       30
EXP2DATA INIT      "123,ABC,456,DEFcl":
                   "ABC,1234,DEF,5678cl":
                   "123,ABC,456,DEFGHcl":
                   "AB,123,D,45678cl"
.
EXP2VL   LIST
EXP2FLD1 DIM       4
EXP2FLD2 DIM       4EXP2FLD3 DIM       4
EXP2FLD4 DIM       4
         LISTEND
.
         MOVE      "0",EOFSW
         LOOP
           EXPLODE   EXP2DATA,"l",EXP2REC
           IF        ZERO
              MOVE      "1",EOFSW
           ENDIF
           EXPLODE   EXP2REC,",c",EXP2VL
           DISPLAY   "FLD1=",EXP2FLD1," FLD2=",EXP2FLD2," FLD3=",EXP2FLD3," FLD4=",EXP2FLD4
         REPEAT    UNTIL (EOFSW=1)


Gerhard Weiss


Thanks for the example Steve. I modified an old program that reads a 5mb weekly client supplied file 1-byte at a time looking for a ~ they use for record termination (no CR/LF in the file). It was a batch process that ran in the middle of the night so taking about an hour was not a big deal.

With your 'Big Gulp' technique it now takes about 6 seconds to process the file.

My server's HD thanks you.

-Mike Maynard


Also think how much more easier it is to test changes made to the program.

You did bring up an interesting condition where the EOR marker is not the standard one supported by PL/B. I checked our system, by searching for *ABSON, and did not find any progams doing that on our system. Darn! I was hoping to fix it.

I bet your network hubs thanks Steve too!

Gerhard Weiss






v1.10

Write to MMCC Technical Support at:               Send e-mail to MMCC.
MMCC - Technical Support
600 W. Midland
Bay City, MI 48708
(989) 686-8860
© 1997 - 2024 MMCC - All Rights Reserved