HSC Information Processing and Technology/Information Systems and Databases

Characteristics of an Information System

 * organise data into information (needs to be done carefully otherwise data becomes meaningless, uses data dictionaries)
 * analysing information to give knowledge (access to information purpose of system, analysing done by tables, queries and reports, decisions based on this information)

Organisation of Data
Data must be organised before it can be analysed by the information system. This must be done carefully, though, or the resulting information will be meaningless. This may require sorting, summarising or classifying. Data is organised using structures such as data dictionaries.

Analyses of Knowledge
Access to information and the resulting knowledge is the purpose of an information system. For people to gain knowledge the information must be analysed. Information systems provide a range of tools for analysis of data such as tables, queries and reports. People make decisions based on the information they receive from an information system.

Types and Purposes of Information Systems

 * TRANSACTION PROCESSING SYSTEMS (TPSs)- collect, store, modify and retrieve the daily transactions of an organisation e.g. a point-of-sale terminal. There are two types of transaction processing:
 * BATCH PROCCESSING collects the transaction data into a group and processes it later and is currently used where the data is in paper form such as cheques. This type has a time delay.
 * REAL-TIME PROCESSING works where each transaction is immediately processed providing instant confirmation but it does require access to an online database.
 * DECISION SUPPORT SYSTEMS (DSS)- assist people to make decisions by providing information, models and analysis tools. They can be used on a daily basis or when an organisation has to react to something unexpected or make changes. Expert systems are a type of DSS.
 * EXPERT SYSTEMS- provide information and solve problems that would otherwise require a person experienced in that field (an expert). They are useful in diagnosing, monitoring, selecting, designing, predicting and training. An expert system asks users a set of questions and compares answers to a knowledge base, which is a set of general facts and if-then rules supplied by an expert. It must then reason to attain a solution. These are not always correct and the choice is up to the user.
 * MANGEMENT INFORMATION SYSTEMS (MISs)- provide information for the organisation’s managers. An MIS presents basic facts about the performance of the organisation e.g. a budget or report. The awareness of how performance is measured provides motivation for workers and helps make decisions. A special type is called the Executive Information System (EIS) which is designed for the information needs of senior managers and provides strategic information.
 * OFFICE AUTOMATION SYSTEMS- provide people with effective ways to complete administrative tasks in an organisation. They use software tools such as word processors, databases etc. and also use communication technology.
 * A GEOGRAPHIC INFORMATION SYSTEM  (GIS), geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data.[1] In the simplest terms, GIS is the merging of cartography, statistical analysis, and database technology.,
 * DATABASE INFORMATION SYSTEMS (DIS)- Most information systems (IS) store data in a database. Databases are accessed by a Database Management System (DBMS). A DBMS is a software package that allow users to manage data. SOme examples include; MS Access, FileMaker Pro, MySQL.

Examples
Information systems that use a database are called database information systems. Databases are accessed by a database management system (DBMS) which has no data in it but is a software package that allows the user to enter maintain and provide access to a database. The user can choose which data is required and how to display it in a meaningful way. The term database is often used instead of DBMS.

Organisation Methods
Organising is the process of arranging, representing and formatting data. A database is an organised collection of data.

Non-Computer Methods
A database is simply a place to organise and store data so that it can be retrieved later for a particular purpose. A telephone book is a non-computer database that organises data alphabetically according to a person’s family name. Searching for the person’s name retrieves their phone number. A paper filing system is a non-computer database. It involves paper, folders and filing cabinets to store data in an organised way. Searching the folders in the filing cabinet retrieves a single piece of paper. Non-computer databases are often the best way to organise data. It is convenient to obtain information from a book or access it from a filing cabinet. People can easily and inexpensively organise data using a non-computer data-base. They do not need a computer or computer skills. Non-computer databases are easier to keep secure, and they remain more private. The data is more difficult to use for other purposes. For example, a telephone book does not allow a thief to search for an address and find the phone number of a house. Computer-based databases are being increasingly used to organise and store data. Some examples include taxation records, library systems, car registrations, student records, CD-ROM encyclopedias and census data. Computer-based databases have several advantages over non-computer databases: Computer-based methods of organising include flat file systems, database management systems and hypermedia.
 * Easily edited
 * Large storage
 * Fast retrieval.
 * Display options.

Electronic/Computer-Based Methods of Data Organisation
•	COMPUTER BASED DATABASES such as car registrations and CD-ROM encyclopedias, have several advantages over non-computer based methods:
 * EASILY EDITED. Data can be corrected and updated without retyping.
 * LARGE STORAGE. Data is stored on a disk and retrieved when required.
 * FAST RETRIEVAL. Data is searched and sorted quickly and accurately to obtain required information.
 * DISPLAY OPTIONS. Data is presented in a number of ways using tables, forms and reports.

Flat File Databases
A flat file database organises data into a single table and is suitable for many small applications. using data structures called files, records, fields, and characters
 * A file is a block of data. When you have done some work on the computer, it is stored in a file. The Address database is a file. A file in a database is divided into a set of related records.
 * A record is a collection of facts about one specific entry in a database. In the Address database, a record is information about a person’s name, address and home phone. A record is divided into one or more related fields.
 * A field is a specific category of data in a database. In the Address database, the family name, given name, street, suburb, postcode and home phone are fields. Data items in a field are made up of characters.
 * A character is the smallest unit of data that people can use. Characters include letters, numbers, and special symbols.

Keys are fields that are used to sort and retrieve information. Usually, each key holds a unique item that applies to only one record. When the records are sorted, the key is used so that not all the data has to be read or retrieved. There are different types of keys A single key is a field in which each item of data is unique. Care must be taken when choosing a single key, as some fields, such as Family Name, are not always unique.
 * A composite key, or compound key, is made by joining two or more fields together. It is used when no data item in any field can be guaranteed to be unique. For example, a compound key can be made from such fields as Gender and Date of Birth.
 * A primary key is a single/unique key or compound key that must have a value. Primary keys cannot be empty or null.
 * A secondary key is a field that contains useful items of data often used in searches. Unlike other keys, secondary keys are not always unique.

Relational Database Systems
A relational database organises data using a series of related tables. Relationships are built between the tables to provide a flexible way of manipulating and combining data. Forms are used to view, enter and change data in the tables. Relational databases are the most commonly used database structure. The organisation of data in a relational database involves a schema.

Schemas
A schema (pronounced ‘skee-ma’) is the data definition for a database. It is an organised plan of the entire database showing how and where the data is found, descriptions of the data, and the data’s logical relationships. In a relational database, the schema defines the entities, attributes and relationships:
 * An entity is a specific thing about which information is collected and stored. The School database has an entity for students, for teachers, for classes and for students and classes A flat file database would have a separate file for each of these entities.
 * An attribute is a defined property of an entity. The Students entity in the School database has such attributes as LastName, FirstName and PhoneNumber. Each attribute of the entity contains a data item. Attributes are the same as fields in a flat file database.
 * A relationship is the way the entities are related to each other. In the School database, the entities are related using keys, such as StudentID. Entities are related in three ways
 * A one-to-one relationship occurs when each record in the first entity is related to exactly one record in the second entity. For example, in primary school each class has one teacher and each teacher relates to only one class.
 * A one-to-many relationship occurs when one record in the first entity is related to many records in the second entity, but any record in the second entity only relates to one record in the first entity. For example, one student can take out many books from the library, but any one book can only be taken out by one student at a time.
 * A many-to-many relationship occurs when each record in the first entity is related to many records in the second entity, and each record in the second entity is related to many records in the first entity. For example, each student studies many subjects, and each subject is studied by many students.

A foreign key is an attribute (field) of a table that is a primary key of another table. In the School database, the StudentID attribute is a foreign key of the Students and Classes table and a primary key of the Students table. The data in a foreign key of one table must match the data in the primary key of another table.

A secondary key is a field that is identified as being suitable for indexing the data such as a surname, It is used to sort the data in a different order to the primary key, A table can have many secondary keys in fact every field could be a secondary key.

Tables
Information about an entity is displayed in a table. A table is the organisation of data into columns and rows. It is sometimes referred to as a relation. A column of a table is also an attribute of an entity or a field of a record. The data in a column must have the same data type and have a distinct name. A row in a table is also called a tuple of an entity, or one record. Each row must be uniquely identified by a key. The intersection of a row and column stores a particular data item, such as ‘Rebecca’ in the first row and second column of the School database. The rows and columns in a table can be viewed in any order without affecting the contents of the table.

Forms
Data is viewed for different purposes using forms. A form is used to view, enter, and change data in a table. The layout of the form can be changed. The user can position fields, headings, instructions and graphics. A well designed form provides information explaining the required data and any data entry rules that apply to particular fields. For example, the field name ‘Sex (M/F)’ leaves no confusion about what data is required and how it should be entered. In a relational database, forms can be based on a single table or on multiple tables.

Advantages
Data independence

Elimination of data redundancy

Easier maintenance of data integrity

Centralised security - greater control over data

Disadvantages
Large size

High Cost

More hardware needed

Higher impact of failure

Data Modelling Tools for Organising Data
Data modelling is the process of identifying entities, the relationships between those entities and the attributes of those entities. It is used to develop a schema for the database. Data modelling is critical to creating an efficient database. There are a number of tools used for data modelling, such as data dictionaries, schematic diagrams and normalisation.

Data Dictionaries
A data dictionary is a comprehensive description of each field (attribute) in the database. It contains information about the characteristics of each item entered in the database, such as the field name, field size, data type and field description:
 * Field name is the name of the field. It should be carefully selected to avoid confusion. Field names should be relatively short but clear and unambiguous.
 * Data type (or field type) is the kind of data that can be stored in the field. Each field stores data in a single data type. Some common data types are text, memo, number, currency, yes/no and date/time. Logical fields contain the logic values ‘true’ or ‘false’.
 * Data Format shows how the data is to be displayed
 * Field size, or width, is the number of characters allowed in each field. It should be limited to the smallest number of characters likely to be needed, as smaller field sizes let the database work faster.
 * Description specifies the contents of the field.
 * Example provides an example of a valid entry in the field.

A data dictionary is a fundamental tool in the development of a database. It consists of metadata, or information about data. It provides a common ground for people working on a project at the same time. For example, if people are working on different entities, they can refer to the data dictionary to check whether a particular attribute already exists. This reduces data redundancy (any undesirable duplication of data) within a database.

Schematic Diagrams
Schematic diagrams are graphical tools that help define the database and develop a schema. A common schematic diagram is called an entity-relationship diagram. An entity-relationship diagram (ERD) is a graphical method of identifying the entities and showing the relationships between them. It helps to determine the data to be included in and excluded from the database. ERDs force people to have a common understanding of the database. They are a useful tool to explain the database. There are numerous notations for ERD.

Normalising Data
Normalisation is the process of organising data into tables so that the results of using the database are unambiguous and as intended. It is a refinement process that aims to reduce data redundancy. Normalisation results in a database that is more efficient but more complex because data is separated into more tables. For example, a table is used to keep track of customer purchases and the price of the products. If you deleted a customer, you would also delete a price. Normalising would solve this problem by dividing this table into two tables: a table about each customer and a product they had bought and a table about each product and its price. Making additions or deletions to either table would not affect the other. The degree of normalisation has been defined in terms of forms, such as:
 * First Norm Form (1NF)—moving data items into separate tables where the data in each table is a similar type. Each table is given a primary key. 1NF is the basic level of normalisation and usually corresponds to the definition of a database.
 * Domain/key normal form (DKNF)—a key uniquely identifies each row in a table. A domain is the set of permissible values for each attribute. Enforcing key and domain restrictions ensures there are no problems when modifying the database. DKNF is a final level of normalisation that most designers aim to achieve.

Hypermedia
Hypermedia is a combination of media whose locations are linked electronically to provide an easy way to navigate between the information. The information is stored using a set of documents that may contain text, images, audio, video or executable programs. Each document is independent, and information is retrieved using hypertext. Hypertext is a system that allows documents to be cross-linked in such a way that the user can move from one document to another by clicking on a link. A link, or hyperlink, is indicated by a highlighted item (usually an underlined word or a graphic) that allows the electronic connection. The author of the hypertext must specify the location of the information accessed by each link. If another computer is the destination of the link, that computer is called a node. The most well-known application of hypermedia and hypertext is the World Wide Web. On the Web, each document is accessed through its uniform/universal resource locator (URL).

HTML
Web pages are created using hypertext markup language (HTML). HTML is a set of special instructions that indicate how parts of a document will be displayed and navigated. HTML documents are actually text files that contain HTML instructions. Many applications will save a document in HTML and insert the special instructions. HTML documents are viewed in a Web browser. An HTML editor is a software program that specialises in writing HTML code. Instructions in HTML are given using HTML tags. HTML tags are metadata, or information about data. A tag consists of a left angle bracket (<), a tag name and a right angle bracket (>).Tags are usually paired, such as  and , to start and end the tag instruction. The end tag looks like the start tag except a slash (/) precedes the text within the brackets. Every HTML document contains certain standard tags, such as and. The tag precedes the title and the tag indicates the beginning of the actual text.

Uniform resource locator
A URL is the address of a file or resource on the Web. It is usually written in lower case, as a single unbroken line, with no spaces. The main parts of the URL are the: URLs must be complete and exact; if they are not, the file or resource will not be found. It is not necessary to memorise URLs. Web browsers have the ability to store the addresses of Web sites. These stored URLs are called bookmarks or favourites and can be organised into categories or folders to save you time.
 * protocol—the transfer method used to access a particular resource on the Web. In most cases the protocol will be ‘http://’ (hypertext transfer protocol).
 * domain name—address of the specific computer where the resource is located, such as ‘www.hi.com.au’. The domain name can be replaced by the IP address.
 * file path—the full path to the file to be retrieved. It may be a simple filename, or it may be a directory path, such as ‘atlas/about.htm’. The domain name and the file path are separated by a forward slash (/).

Storyboard
A storyboard is a series of frames, each representing a different action or screen image. It is a tool used to organise hypermedia. Storyboards are drawn on paper or using a computer. They are frequently edited. They define each screen and its specific media type. Storyboards consist of navigation paths, information and graphics. They are popular because they are simple to construct, are easy to read and can be modified at every stage. There are four storyboard layouts:
 * linear—a simple sequential path that is set up quickly
 * hierarchical—a sequential path in a top-down design; the user starts at the beginning and moves down through the multimedia product
 * non-linear—no structure; the user moves between different layouts in any direction
 * combination—a blending of the above layouts.

Storage and Retrieval
A database management system is used to provide secure and efficient methods for storage and retrieval of information from a database.

DATABASE MANAGEMENT SYSTEM (DBMS)- is a software package that allows users to access a database so they can enter, maintain and view the data. In a DBMS data is organised into tables, viewed in forms, retrieved using queries and displayed in reports. DBMSs manipulate the data in ways such as searching and sorting. They also perform a number of tasks to help users develop and maintain a database:
 * organising the data using a data dictionary
 * showing relationships between entities using schematic diagrams
 * checking for identifiable errors in data entry
 * allowing flexibility to change the definition of the database
 * restricting access to the data to authorised people
 * providing information about the performance of a database.

Accessing Data

 * SEQUENTIAL ACCESS occurs when data is accessed in a sequence. Data is accessed in the order it was stored. It does not require the exact location of the data item. Sequential is much slower than direct and impractical when immediate processing is required. It is the only method of accessing data stored on magnetic tape.


 * DIRECT ACCESS occurs when data is accessed without accessing previous data items. Data is stored in a particular storage location based on a mathematical procedure or algorithm. Direct access uses this algorithm to calculate the approximate location of the data. If the data is not found here the computer searches through successive locations until it is found. Direct data often requires the use of an index and this is called indexed access.


 * INDEX- is a table that contains information about the location of data. Data is accessed by referring to the index and obtaining the exact location of the data. The indexed method is widely used to store data on a disk, as well as store the logical location of data in a database.

Distributed Database Systems
DISTRIBUTED DATABASES- are databases located at more than one site. They act as a single collection of data that is geographically dispersed. Distributed databases reduce data transmission costs that would occur if all users at all sites had to access one centralised database. However they make it more difficult to obtain a complete view of the database. Distributed databases often need to be synchronised to keep them current. A two-phase commit is a part of the DBMS used to maintain consistency across a distributed database. It checks whether both parts of the database are ready for transmission and then performs it.

Storage Media

 * ONLINE STORAGE uses a peripheral device that is under the user’s direct control e.g. a hard disk.
 * OFFLINE STORAGE uses a peripheral device that is not under the user’s direct control e.g. a centralised database.

Both online and offline storage use a variety of peripheral devices and storage mediums:


 * HARD DISK- is a storage device made of metal or glass and covered with magnetic material. Usually mounted or fixed in case with storage capacity measured in Gb and accessed using direct access.


 * OPTICAL DISK- is a polycarbonate plastic disk with a reflective layer of metal covering the surface. Data is read and written using lasers and data is retrieved using direct access. Types:
 * CD-ROMS (Compact Disks Read Only Memory) are 12cm and capable of storing 650Mb. Convenient for storing constant data.
 * DVD-ROM (Digital Versatile Disk Read Only Memory) are same physical size as CD but store between 4.7 and 17Gb.

-Zip disks store 100 or 250Mb of data and have a transfer rate of 1Mbps. -Jaz disks are an upgrade of Zips. They store up to 2Gb and have a transfer rate of 5.5Mbps.
 * REMOVABLE CARTRIDGE- is a hard disk encased in a plastic or metal cartridge that can be removed like a floppy disk. Fast (yet not as fast as a fixed hard disk) and use direct access. Two types:


 * MAGNETIC TAPE- is a very thin, long strip of plastic, coated with a thin layer of magnetic material. The tape is wound on 2 reels inside a cartridge. They can store large quantities of data, inexpensively, using sequential access and are usually used as a back-up medium.

Encryption and Decryption
ENCRYPTION is the process of coding data, and decryption is the process of decoding it. It is the most effective way of achieving data security. Data is encoded, transmitted and then converted back to its original form. Encryption is essential for financial transactions and is used extensively on the Internet. Methods often involve complex manipulation of bit patterns. One problem is to find a method that is difficult to decode yet practical to use. Two main types:
 * ASYMMETRIC ENCRYPTION requires a key for encryption and a key for decryption. Common method is public key encryption, which involves a public key that is widely available and used for encryption and a private key that is kept secret and used for decryption. Both are developed using complicated number theory.
 * SYMMETRIC ENCRYPTION requires the same key for both encryption and decryption. Common method is Data Encryption Standard (DES).

Data Backup and Security
BACKUP- is another copy of data that can be used to rebuild the system. DBMSs contain backup and recovery capabilities where backups are created at specified times. If the system goes down the recovery process rebuilds the data. It uses the last completed backup and a journal listing all the actions completed by the DBMS since the last backup. The success of backup and recovery depends on implementing appropriate procedures. Backups are usually stored offsite or in a fireproof safe. Backups eliminate the need for replacing data, which is an enormously costly action.

DATA SECURITY- involves a series of safeguards to protect the data. First access is limited to authorised users using passwords, personal objects and biometric devices. Further safeguards involve encryption and firewalls.
 * FIREWALL- checks the password of anyone trying to access the network (can be used on Internet also) to verify and authenticate incoming data. Expensive to install and maintain. Large systems require more than one, as barriers need to be placed at critical points.

Sorting
SORTING is the process of arranging data in a particular order. Meaningfully organised data in a database is easier to use. Sorts are performed in ascending or descending order. Sorting can be done on more than one field (attribute) so that records are arranged in the most useful order.

Searching
SEARCHING is the process of examining the database to retrieve data. In small databases the effective way to do this is to browse or use the Find or Search command. In large databases it is done with a query.

Structured Query Language
QUERY- is a search of a database for records that meet a certain condition. It is a question you ask the database. The results are usually displayed in a table but can be used as the basis for a form, report, graph or another query. A query can also update or delete multiple records at the same time and perform calculations on data. Queries are constructed in the form :-  .

The data is often called the criteria. A common method for constructing a query is query by example, which requires a user to enter the criteria against a field. If the search of a database involves two or more entities it is called a relational query.

QUERY LANGUAGE- is a specialised language designed to search a database.

STRUCTURED QUERY LANGUAGE (SQL) is a query language used to access and manipulate data in a relational database. SQL statements contain keywords that are used to perform a particular task. When searching in SQL it is essential to use correct syntax. In most DBMSs keywords are in uppercase, fields are separated by commas, an entity and an attribute in that entity are separated by a fullstop, and the search criteria or data item is enclosed in double quotes. The keyword ORDER BY sorts on a field in ascending (ASC) or descending (DESC) order with ascending as the default.

OPERATORS are used to construct some queries. The operator represents the action to be performed. Operators are classified as relational or logical:


 * RELATIONAL OPERATORS (=, <>, <, >, <=, >=) are characters or symbols indicating the relationship between two expressions.
 * WILDCARD CHARACTERS represent one or more unknown characters. Common wildcards include (*) which substitutes for any number of characters, and the (?) which substitute for just one.


 * LOGICAL OPERATORS (AND, OR, NOT) are used to combine queries so that a search is carried out on one or more fields. It is important to understand the difference between AND and OR operators:
 * AND operator requires both the first and second query to be true as it retrieves records that satisfy both.
 * OR operator requires either the first or second query to be true as it retrieves records that satisfy either e.g. ‘Last Name = Howe’ OR ‘Suburb = Kearns’.

SELECT ARTWORKS.ArtisName, ARTWORKS.ArtistSurname, GALLERIES.GalleryName, ARTWORKS.Cost     //SELECT the attributes
 * Example

FROM GALLERIES INNER JOIN ARTWORKS ON GALLERIES.GalleryID = ARTWORKS.Gallery               //FROM the entities

WHERE (((ARTWORKS.Cost)>5000))                                                            //meeting these criteria

ORDER BY ARTWORKS.ArtisName DESC;                                                        //ASC or DESC order

Using a Search Engine
SEARCH ENGINE- is a database of indexed Web sites that allows a keyword search. Search engines are used to search hypermedia. An index is a table, which contains information about the location of data. Indexes allow documents to be found using a keyword search. A search engine’s index is built by regularly scanning the Web for new sites and accepting submissions from Webpage authors. The scanning is often completed by programs called spiders, crawlers or robots. They send back the URL of any document they find to the search engines indexing software. This indexing software collects information, such as titles and keywords, from the Web sites then indexes these in a database. Each search engine has a different method of building its index, which is why searches of the same keyword using different search engines obtains different results. A keyword is entered related to the topic of interest. Most search engines allow a search on a series of keywords. When the user requests a keyword the search engine scans the index and displays a list of Web sites that contain that keyword. Choosing the right keyword is essential to the success of a search. More than one word, a synonym, should be used and it should be very specific. Search engines allow the use of Boolean qualities such as ‘AND’, ‘OR’ and ‘NOT’ to refine a search. Some search engines also offer dictionaries.

DICTIONARIES are lists of Web sites organised into categories (e.g. sport). They are another way to find information on the net. The owners of a site need to register it to be included in a directory. As a result directories often cover a small fraction of the pages available on the Net. A directory is useful for finding information on a general topic.

How does a Search Engine Work?
SEARCH ROBOT- is a program that accesses Web sites and gathers information for search engine indexes. Alta Vista uses a robot called Scooter.

Displaying Data in a Database Report
DISPLAYING is presenting the output from an information system or database to meet a given purpose.

Report Design
REPORT- is the formatted and organised presentation of data (e.g. an invoice). A DBMS allows complete control is the design of a report in either tabular or column layout. It is possible to insert headings, sort data, choose the fields, switch fields, change column widths and select records. The purpose of the report determines its content, format and style. Before creating a report the user needs to select the required records by constructing a query. Next they select the fields, which are placed in appropriate positions, formatted and sorted. Most DBMSs offer functions such as count and sum. Most DBMS reports have the following sections:
 * REPORT HEADER- appears once at the beginning of a report. It contains such items as logo, report title and date.
 * PAGE HEADER- displays information such as title, column headings, or any information needed at the top of every page. Appears after report header on first page of report.
 * DETAILS- section displays most of the information.
 * PAGE FOOTER- displays information such as date, page number or anything else wanted at bottom of every page.
 * REPORT FOOTER- appears once at end of report after page footer on last page of report. It displays items such as report totals.

DIFFERENT VIEWS of a database are constructed using a form. A form is used to view, enter, and change data in a table and the layout can be changed. The user can position fields, headings, instructions and graphics. A well-designed form provides information explaining the required data and any rules that apply to particular fields. Good design principles include:
 * headings that identify the purpose of the report
 * layouts such as tabular or column that efficiently present information
 * text that is balanced on the page horizontally or vertically
 * consistent styles throughout report
 * columns with clear descriptive headings
 * appropriate white spaces
 * page numbers and date in header in footer.

Acknowledgement of Data Sources
DATA SOURCE- is the person or organisation that developed the data. Data may come from informal sources, such as conversations, meetings or observations, or it may come from formal sources, such as a report, book or official document. A formal source often provides data that is logically organised. However there is no guarantee to its accuracy. The source of data is protected by the Copyright Act. People cannot copy another’s work without their permission and it is illegal to violate copyright. Most data from the Net is protected by copyright. Text and audio etc. from a Web site should not be used without acknowledgment of and permission from the owner. However the Copyright Act makes special provisions for students to use information for research purposes. The student is allowed to use a reasonable portion of the original work if it is correctly cited. This includes author’s surname and initial or organisation’s name, title of complete work or Web page, URL, date of document, and download date.

The Freedom of Information Act
ACCESS to data is the extent data is available to people. The Freedom of Information Act is designed to allow individuals to find out what data is being kept by the government and other public bodies. It states that individuals have the right to access information where it relates to the individual and does not invade another person’s privacy. There is free release of this information. However knowing what data is kept does not indicate how it is being used. People can use our preferences, weaknesses and habits to their advantage. Ownership and control of such data is an ethical issue.

Privacy
PRIVACY is the ability of an individual to control personal data. It is a major concern for those involved with databases especially as privacy is eroded by linking databases.

Data Integrity
DATA INTEGRITY describes the reliability of the data. Reliable data is accurate, current and relevant. There is no guarantee the information on the Net is reliable as it is easy to publish to the Internet.

DATA VALIDATION is used to check the entry of data:
 * RANGE CHECK- is used if the data is restricted to a small range of particular values.
 * LIST CHECK- is used when the data can be compared to a set of accepted data.
 * TYPE CHECK- is used to determine whether the data type is correct.
 * CHECK DIGIT- is a digit calculated from the digits of a code number and then added to that number as an extra digit.

ACCURACY OF DATA- is the extent to which it is free from errors. Data that is collected on individuals is not always accurate. This inaccuracy may be caused by mistakes in gathering or entering data, mismatch of data and the person, or out-of-date information. Opportunities to check and change incorrect data should be provided. It is often necessary to compare data from a number of different sources to determine which data is accurate.

New Trends in the Organisation, Processing, Storage and Retrieval of Data
DATA WAREHOUSES are databases that collect information from different data sources. It is a storage area of raw data that can be analysed to assist organisations to make decisions. The contents of a data warehouse are usually historical and static and will change if new requirements are identified. A more sophisticated approach to obtaining information is data mining.

DATA MINING is a process that looks for relationships and patterns in the data stored in a database. It sorts through the data and turns up interesting and useful connections (e.g. in the transactions at a supermarket). One problem is that many patterns occur by chance and have no value in making decisions. It raises issues of privacy and ownership of data.

]

Online Transaction Processing (OLTP)
OLTP refers to optimising databases to improve online performance. According to http://www.databasejournal.com, this optimising consists of organising the data into layers.
 * Business logic layer – rules and procedures, optimised so that frequent requests are handled fast
 * Presentation layer – the only thing that happens at the client’s end
 * Data layer – all data needed to process transactions

Online Analytical Processing
OLAP is about analysing historical data with a view to extracting information from it. In the past, this has been a slow process because properly normalised databases have data stored in discrete tables. It does this by prebuilding sets of data based on a snapshot of the real database. File size blows out because the data is no longer normalised, but query speed is much higher.

More information available at

http://www.ebroadcast.com.au/lookup/encyclopedia/ol/OLAP.html

http://office.microsoft.com/en‐au/excel/HP101774371033.aspx

Drilling down refers to taking a generalised set of results and looking more deeply at a part of those results. For example, “The rainfall in May was relatively high” contains some information. question “If you ask the question, Did it rain all month or come in bursts?”, you are drilling down.

Drilling down further could look at a day by day, hour by hour or minute by minute analysis of the rainfall. There is a nice demonstration at

http://www.quietlyscheming.com/blog/charts/chart‐drilldown‐animations/