Ada Programming/Libraries/GNAT.AWK

-- --                                                                         -- --                         GNAT COMPILER COMPONENTS                         -- --                                                                         -- --                              G N A T. A W K                            -- --                                                                         -- --                                 S p e c                                  -- --                                                                         -- --                     Copyright (C) 2000-2006, AdaCore                     -- --                                                                         -- -- GNAT is free software;  you can  redistribute it  and/or modify it under -- -- terms of the GNU General Public License as published  by the Free Soft- -- -- ware Foundation;  either version 2,  or (at your option) any later ver- -- -- sion. GNAT is distributed in the hope that it will be useful, but WITH- -- -- OUT ANY WARRANTY; without even the  implied warranty of MERCHANTABILITY -- -- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -- -- for more details. You should have received  a copy of the GNU General -- -- Public License distributed with GNAT;  see file COPYING. If not, write -- -- to the  Free Software Foundation,  51  Franklin  Street,  Fifth  Floor, -- -- Boston, MA 02110-1301, USA. -- --                                                                         -- -- -- -- -- -- -- -- -- GNAT was originally developed  by the GNAT team at  New York University. -- -- Extensive contributions were provided by Ada Core Technologies Inc.     -- --                                                                         -- --

-- This is an AWK-like unit. It provides an easy interface for parsing one -- or more files containing formatted data. The file can be viewed seen as -- a database where each record is a line and a field is a data element in --  this line. In this implementation an AWK record is a line. This means -- that a record cannot span multiple lines. The operating procedure is to -- read files line by line, with each line being presented to the user of --  the package. The interface provides services to access specific fields -- in the line. Thus it is possible to control actions taken on a line based -- on values of some fields. This can be achieved directly or by registering -- callbacks triggered on programmed conditions. -- -- The state of an AWK run is recorded in an object of type session. -- The following is the procedure for using a session to control an --  AWK run: -- --    1) Specify which session is to be used. It is possible to use the --        default session or to create a new one by declaring an object of --        type Session_Type. For example: -- --           Computers : Session_Type; -- --     2) Specify how to cut a line into fields. There are two modes: using --       character fields separators or column width. This is done by using --       Set_Fields_Separators or Set_Fields_Width. For example by: -- --          AWK.Set_Field_Separators (";,", Computers); -- --       or by using iterators' Separators parameter. -- --    3) Specify which files to parse. This is done with Add_File/Add_Files --        services, or by using the iterators' Filename parameter. For --        example: -- --           AWK.Add_File ("myfile.db", Computers); -- --     4) Run the AWK session using one of the provided iterators. -- --          Parse --             This is the most automated iterator. You can gain control on --             the session only by registering one or more callbacks (see --              Register). -- --          Get_Line/End_Of_Data --             This is a manual iterator to be used with a loop. You have --             complete control on the session. You can use callbacks but --             this is not required. -- --          For_Every_Line --             This provides a mixture of manual/automated iterator action. -- --       Examples of these three approaches appear below -- -- There are many ways to use this package. The following discussion shows -- three approaches to using this package, using the three iterator forms. -- All examples will use the following file (computer.db): -- --    Pluton;Windows-NT;Pentium III --    Mars;Linux;Pentium Pro --    Venus;Solaris;Sparc --    Saturn;OS/2;i486 --    Jupiter;MacOS;PPC -- -- 1) Using Parse iterator -- --     Here the first step is to register some action associated to a pattern --     and then to call the Parse iterator (this is the simplest way to use --    this unit). The default session is used here. For example to output the --     second field (the OS) of computer "Saturn". -- --           procedure Action is --           begin --              Put_Line (AWK.Field (2)); --           end Action; -- --        begin --           AWK.Register (1, "Saturn", Action'Access); --           AWK.Parse (";", "computer.db"); -- -- --  2) Using the Get_Line/End_Of_Data iterator -- --    Here you have full control. For example to do the same as --    above but using a specific session, you could write: -- --          Computer_File : Session_Type; -- --       begin --          AWK.Set_Current (Computer_File); --          AWK.Open (Separators => ";", --                     Filename   => "computer.db"); -- --          --  Display Saturn OS -- --          while not AWK.End_Of_File loop --             AWK.Get_Line; -- --             if AWK.Field (1) = "Saturn" then --                Put_Line (AWK.Field (2)); --             end if; --          end loop; -- --          AWK.Close (Computer_File); -- -- -- 3) Using For_Every_Line iterator -- --     In this case you use a provided iterator and you pass the procedure --     that must be called for each record. You could code the previous --     example could be coded as follows (using the iterator quick interface --    but without using the current session): -- --           Computer_File : Session_Type; -- --           procedure Action (Quit : in out Boolean) is --           begin --              if AWK.Field (1, Computer_File) = "Saturn" then --                 Put_Line (AWK.Field (2, Computer_File)); --              end if; --           end Action; -- --           procedure Look_For_Saturn is --              new AWK.For_Every_Line (Action); -- --        begin --           Look_For_Saturn (Separators => ";", --                           Filename   => "computer.db", --                           Session    => Computer_File); -- --           Integer_Text_IO.Put --             (Integer (AWK.NR (Session => Computer_File))); --           Put_Line (" line(s) have been processed."); -- --  You can also use a regular expression for the pattern. Let us output --  the computer name for all computer for which the OS has a character --  O in its name. -- --           Regexp   : String := ".*O.*"; -- --           Matcher  : Regpat.Pattern_Matcher := Regpat.Compile (Regexp); -- --           procedure Action is --           begin --              Text_IO.Put_Line (AWK.Field (2)); --           end Action; -- --        begin --           AWK.Register (2, Matcher, Action'Unrestricted_Access); --           AWK.Parse (";", "computer.db"); --

with Ada.Finalization; with GNAT.Regpat;

package GNAT.AWK is

Session_Error : exception; -- Raised when a Session is reused but is not closed

File_Error : exception; -- Raised when there is a file problem (see below)

End_Error : exception; -- Raised when an attempt is made to read beyond the end of the last -- file of a session.

Field_Error : exception; -- Raised when accessing a field value which does not exist

Data_Error : exception; -- Raised when it is impossible to convert a field value to a specific type

type Count is new Natural;

type Widths_Set is array (Positive range <>) of Positive; -- Used to store a set of columns widths

Default_Separators : constant String := " " & ASCII.HT;

Use_Current : constant String := ""; -- Value used when no separator or filename is specified in iterators

type Session_Type is limited private; -- This is the main exported type. A session is used to keep the state of  --  a full AWK run. The state comprises a list of files, the current file, -- the number of line processed, the current line, the number of fields in   --  the current line... A default session is provided (see Set_Current,  --  Current_Session and Default_Session above).

-- Package initialization --

-- To be thread safe it is not possible to use the default provided -- session. Each task must used a specific session and specify it  --  explicitly for every services.

procedure Set_Current (Session : Session_Type); -- Set the session to be used by default. This file will be used when the -- Session parameter in following services is not specified.

function Current_Session return Session_Type; -- Returns the session used by default by all services. This is the -- latest session specified by Set_Current service or the session -- provided by default with this implementation.

function Default_Session return Session_Type; -- Returns the default session provided by this package. Note that this is  --  the session return by Current_Session if Set_Current has not been used.

procedure Set_Field_Separators (Separators : String      := Default_Separators;      Session    : Session_Type); procedure Set_Field_Separators (Separators : String      := Default_Separators); -- Set the field separators. Each character in the string is a field -- separator. When a line is read it will be split by field using the -- separators set here. Separators can be changed at any point and in this -- case the current line is split according to the new separators. In the -- special case that Separators is a space and a tabulation -- (Default_Separators), fields are separated by runs of spaces and/or -- tabs.

procedure Set_FS (Separators : String      := Default_Separators;      Session    : Session_Type) renames Set_Field_Separators; procedure Set_FS (Separators : String      := Default_Separators) renames Set_Field_Separators; -- FS is the AWK abbreviation for above service

procedure Set_Field_Widths (Field_Widths : Widths_Set;     Session      : Session_Type); procedure Set_Field_Widths (Field_Widths : Widths_Set); -- This is another way to split a line by giving the length (in number of   --  characters) of each field in a line. Field widths can be changed at any -- point and in this case the current line is split according to the new -- field lengths. A line split with this method must have a length equal or  --  greater to the total of the field widths. All characters remaining on  --  the line after the latest field are added to a new automatically -- created field.

procedure Add_File (Filename : String;     Session  : Session_Type); procedure Add_File (Filename : String); -- Add Filename to the list of file to be processed. There is no limit on  --  the number of files that can be added. Files are processed in the order -- they have been added (i.e. the filename list is FIFO). If Filename does -- not exist or if it is not readable, File_Error is raised.

procedure Add_Files (Directory            : String;      Filenames             : String;      Number_Of_Files_Added : out Natural;      Session               : Session_Type); procedure Add_Files (Directory            : String;      Filenames             : String;      Number_Of_Files_Added : out Natural); -- Add all files matching the regular expression Filenames in the specified -- directory to the list of file to be processed. There is no limit on  --  the number of files that can be added. Each file is processed in  --  the same order they have been added (i.e. the filename list is FIFO). -- The number of files (possibly 0) added is returned in   --  Number_Of_Files_Added.

-  -- Information about current state -- -

function Number_Of_Fields (Session : Session_Type) return Count; function Number_Of_Fields return Count; pragma Inline (Number_Of_Fields); -- Returns the number of fields in the current record. It returns 0 when -- no file is being processed.

function NF    (Session : Session_Type) return Count renames Number_Of_Fields; function NF    return Count renames Number_Of_Fields; -- AWK abbreviation for above service

function Number_Of_File_Lines (Session : Session_Type) return Count; function Number_Of_File_Lines return Count; pragma Inline (Number_Of_File_Lines); -- Returns the current line number in the processed file. It returns 0 when -- no file is being processed.

function FNR (Session : Session_Type) return Count renames Number_Of_File_Lines; function FNR return Count renames Number_Of_File_Lines; -- AWK abbreviation for above service

function Number_Of_Lines (Session : Session_Type) return Count; function Number_Of_Lines return Count; pragma Inline (Number_Of_Lines); -- Returns the number of line processed until now. This is equal to number -- of line in each already processed file plus FNR. It returns 0 when -- no file is being processed.

function NR (Session : Session_Type) return Count renames Number_Of_Lines; function NR return Count renames Number_Of_Lines; -- AWK abbreviation for above service

function Number_Of_Files (Session : Session_Type) return Natural; function Number_Of_Files return Natural; pragma Inline (Number_Of_Files); -- Returns the number of files associated with Session. This is the total -- number of files added with Add_File and Add_Files services.

function File (Session : Session_Type) return String; function File return String; -- Returns the name of the file being processed. It returns the empty -- string when no file is being processed.

-  -- Field accessors -- -

function Field (Rank   : Count;      Session : Session_Type) return String; function Field (Rank   : Count) return String; -- Returns field number Rank value of the current record. If Rank = 0 it  --  returns the current record (i.e. the line as read in the file). It  --  raises Field_Error if Rank > NF or if Session is not open.

function Field (Rank   : Count;      Session : Session_Type) return Integer; function Field (Rank   : Count) return Integer; -- Returns field number Rank value of the current record as an integer. It  --  raises Field_Error if Rank > NF or if Session is not open. It  --  raises Data_Error if the field value cannot be converted to an integer.

function Field (Rank   : Count;      Session : Session_Type) return Float; function Field (Rank   : Count) return Float; -- Returns field number Rank value of the current record as a float. It  --  raises Field_Error if Rank > NF or if Session is not open. It  --  raises Data_Error if the field value cannot be converted to a float.

generic type Discrete is (<>); function Discrete_Field (Rank   : Count;      Session : Session_Type) return Discrete; generic type Discrete is (<>); function Discrete_Field_Current_Session (Rank   : Count) return Discrete; -- Returns field number Rank value of the current record as a type -- Discrete. It raises Field_Error if Rank > NF. It raises Data_Error if  --  the field value cannot be converted to type Discrete.

-- Pattern/Action --

-- AWK defines rules like "PATTERN { ACTION }". Which means that ACTION -- will be executed if PATTERN match. A pattern in this implementation can -- be a simple string (match function is equality), a regular expression, -- a function returning a boolean. An action is associated to a pattern -- using the Register services. --  --  Each procedure Register will add a rule to the set of rules for the -- session. Rules are examined in the order they have been added.

type Pattern_Callback is access function return Boolean; -- This is a pattern function pointer. When it returns True the associated -- action will be called.

type Action_Callback is access procedure; -- A simple action pointer

type Match_Action_Callback is    access procedure (Matches : GNAT.Regpat.Match_Array); -- An advanced action pointer used with a regular expression pattern. It  --  returns an array of all the matches. See GNAT.Regpat for further -- information.

procedure Register (Field  : Count;      Pattern : String;      Action  : Action_Callback;      Session : Session_Type); procedure Register (Field  : Count;      Pattern : String;      Action  : Action_Callback); -- Register an Action associated with a Pattern. The pattern here is a  --  simple string that must match exactly the field number specified.

procedure Register (Field  : Count;      Pattern : GNAT.Regpat.Pattern_Matcher;      Action  : Action_Callback;      Session : Session_Type); procedure Register (Field  : Count;      Pattern : GNAT.Regpat.Pattern_Matcher;      Action  : Action_Callback); -- Register an Action associated with a Pattern. The pattern here is a  --  simple regular expression which must match the field number specified.

procedure Register (Field  : Count;      Pattern : GNAT.Regpat.Pattern_Matcher;      Action  : Match_Action_Callback;      Session : Session_Type); procedure Register (Field  : Count;      Pattern : GNAT.Regpat.Pattern_Matcher;      Action  : Match_Action_Callback); -- Same as above but it pass the set of matches to the action -- procedure. This is useful to analyse further why and where a regular -- expression did match.

procedure Register (Pattern : Pattern_Callback;     Action  : Action_Callback;      Session : Session_Type); procedure Register (Pattern : Pattern_Callback;     Action  : Action_Callback); -- Register an Action associated with a Pattern. The pattern here is a  --  function that must return a boolean. Action callback will be called if  --  the pattern callback returns True and nothing will happen if it is   --  False. This version is more general, the two other register services -- trigger an action based on the value of a single field only.

procedure Register (Action : Action_Callback;      Session : Session_Type); procedure Register (Action : Action_Callback); -- Register an Action that will be called for every line. This is  --  equivalent to a Pattern_Callback function always returning True.

-- Parse iterator --

procedure Parse (Separators : String := Use_Current;     Filename   : String := Use_Current;      Session    : Session_Type); procedure Parse (Separators : String := Use_Current;     Filename   : String := Use_Current); -- Launch the iterator, it will read every line in all specified -- session's files. Registered callbacks are then called if the associated -- pattern match. It is possible to specify a filename and a set of  --  separators directly. This offer a quick way to parse a single -- file. These parameters will override those specified by Set_FS and -- Add_File. The Session will be opened and closed automatically. -- File_Error is raised if there is no file associated with Session, or if   --  a file associated with Session is not longer readable. It raises -- Session_Error is Session is already open.

---  -- Get_Line/End_Of_Data Iterator -- ---

type Callback_Mode is (None, Only, Pass_Through); -- These mode are used for Get_Line/End_Of_Data and For_Every_Line -- iterators. The associated semantic is: --  --    None --      callbacks are not active. This is the default mode for --      Get_Line/End_Of_Data and For_Every_Line iterators. --  --    Only --      callbacks are active, if at least one pattern match, the associated --      action is called and this line will not be passed to the user. In  --       the Get_Line case the next line will be read (if there is some   --       line remaining), in the For_Every_Line case Action will --      not be called for this line. --  --    Pass_Through --      callbacks are active, for patterns which match the associated --      action is called. Then the line is passed to the user. It means --      that Action procedure is called in the For_Every_Line case and --      that Get_Line returns with the current line active. --

procedure Open (Separators : String := Use_Current;     Filename   : String := Use_Current;      Session    : Session_Type); procedure Open (Separators : String := Use_Current;     Filename   : String := Use_Current); -- Open the first file and initialize the unit. This must be called once -- before using Get_Line. It is possible to specify a filename and a set of  --  separators directly. This offer a quick way to parse a single file. -- These parameters will override those specified by Set_FS and Add_File. -- File_Error is raised if there is no file associated with Session, or if   --  the first file associated with Session is no longer readable. It raises -- Session_Error is Session is already open.

procedure Get_Line (Callbacks : Callback_Mode := None;     Session   : Session_Type); procedure Get_Line (Callbacks : Callback_Mode := None); -- Read a line from the current input file. If the file index is at the -- end of the current input file (i.e. End_Of_File is True) then the -- following file is opened. If there is no more file to be processed, -- exception End_Error will be raised. File_Error will be raised if Open -- has not been called. Next call to Get_Line will return the following -- line in the file. By default the registered callbacks are not called by  --  Get_Line, this can activated by setting Callbacks (see Callback_Mode   --  description above). File_Error may be raised if a file associated with -- Session is not readable. --  --  When Callbacks is not None, it is possible to exhaust all the lines -- of all the files associated with Session. In this case, File_Error -- is not raised. --  --  This procedure can be used from a subprogram called by procedure Parse -- or by an instantiation of For_Every_Line (see below).

function End_Of_Data (Session : Session_Type) return Boolean; function End_Of_Data return Boolean; pragma Inline (End_Of_Data); -- Returns True if there is no more data to be processed in Session. It  --  means that the latest session's file is being processed and that -- there is no more data to be read in this file (End_Of_File is True).

function End_Of_File (Session : Session_Type) return Boolean; function End_Of_File return Boolean; pragma Inline (End_Of_File); -- Returns True when there is no more data to be processed on the current -- session's file.

procedure Close (Session : Session_Type); -- Release all associated data with Session. All memory allocated will -- be freed, the current file will be closed if needed, the callbacks -- will be unregistered. Close is convenient in reestablishing a session -- for new use. Get_Line is no longer usable (will raise File_Error) -- except after a successful call to Open, Parse or an instantiation -- of For_Every_Line.

-  -- For_Every_Line iterator -- -

generic with procedure Action (Quit : in out Boolean); procedure For_Every_Line (Separators : String := Use_Current;     Filename   : String := Use_Current;      Callbacks  : Callback_Mode := None;      Session    : Session_Type); generic with procedure Action (Quit : in out Boolean); procedure For_Every_Line_Current_Session (Separators : String := Use_Current;     Filename   : String := Use_Current;      Callbacks  : Callback_Mode := None); -- This is another iterator. Action will be called for each new -- record. The iterator's termination can be controlled by setting Quit -- to True. It is by default set to False. It is possible to specify a  --  filename and a set of separators directly. This offer a quick way to  --  parse a single file. These parameters will override those specified by  --  Set_FS and Add_File. By default the registered callbacks are not called -- by For_Every_Line, this can activated by setting Callbacks (see   --  Callback_Mode description above). The Session will be opened and -- closed automatically. File_Error is raised if there is no file -- associated with Session. It raises Session_Error is Session is already -- open.

private type Session_Data; type Session_Data_Access is access Session_Data;

type Session_Type is new Ada.Finalization.Limited_Controlled with record Data : Session_Data_Access; end record;

procedure Initialize (Session : in out Session_Type); procedure Finalize  (Session : in out Session_Type);

end GNAT.AWK; |GNAT.AWK