IB/Group 4/Computer Science/Databases/Basic concepts

=Basic concepts=

A.1.1 Outline the differences between data and information.
Computers store data. Data can be any one of several different types (e.g. numeric, text, Boolean, etc.) but has no intrinsic meaning to a human. Data becomes information when it is put into a context that gives it meaning.

For example: 32 23  11  08  40  17 is data, but it has no meaning.

If we provide a context for that data, it becomes information, e.g.: Thus: Information = Data + context
 * The home and away scores for 6 soccer teams last Saturday
 * The temperatures in degrees Celsius for 6 cities around the world at mid-day today
 * The ages in years of the last 6 people to walk through the turnstiles of the Eiffel Tower in Paris.

Strictly speaking, databases store data, not information. However the terminology is commonly used loosely as there is an assumption that data stored in and retrieved from a database is interpreted by human beings within the scope of an appropriate context.

A.1.2 Outline the differences between an information system and a database.
"An information system (IS) is any organized system for the collection, organization, storage and communication of information.

"An information system (IS) is a group of components that interact to produce information."

"..information systems .. are made up of six components: hardware, software, data, people, network, and process."

Information system is therefore a wider term than database as it refers to a complete system, although many (if not most) information systems include one or more databases as part of their overall structure.

A.1.3 Discuss the need for databases.
There are several unique problems that arise when not using a database:

1. Data Size: Storing a small amount of data in a spreadsheet is manageable; however, when the data becomes extensive, a spreadsheet solution is no longer viable. If the number of data records reaches millions, storing the data in multiple spreadsheets can cause speed issues, making it time-consuming to find a specific record.

2. Ease of Updating Data: Multiple users cannot edit the same file simultaneously. As a result, others must wait for the file to become available for updating, leading to wasted time.

3. Accuracy: Data entry into spreadsheets or files may be prone to errors since there is no built-in data validation. Users might input incorrect spellings, dates, or amounts, making it challenging to maintain data accuracy.

4. Security: Data stored in text files or spreadsheets cannot be adequately secured. Unauthorized users can access and read the information, making these storage methods unsuitable for applications requiring privacy, such as banking, healthcare, or payroll departments.

5. Redundancy: Duplication of data is possible when using text files or spreadsheets. Limiting the addition of multiple data copies is difficult, leading to accuracy issues. Managing and updating numerous copies is not an easy task.

6. Incomplete Data: Some data may not be considered important, and therefore not entered into the file due to a lack of validation. This compromises data integrity.

To avoid the aforementioned problems associated with storing data in text files or spreadsheets, a database is necessary.

A.1.4 Describe the use of transactions, states and updates to maintain data consistency (and integrity).
A transaction is a unit of work that is performed against a database. Transactions are units or sequences of work accomplished in a logical order, whether in a manual fashion by a user or automatically by some sort of a database program.

A transaction is the propagation of one or more changes to the database. For example, if you are creating a record or updating a record or deleting a record from the table, then you are performing a transaction on that table. It is important to control these transactions to ensure the data integrity and to handle database errors.

For example, to ensure data consistency when moving money between two accounts it is necessary to complete two operations (debiting one account and crediting the other). Unless both operations are carried out successfully, the transaction will be rolled back.

Consistency, in the context of databases, states that data cannot be written that would violate the database’s own rules for valid data. If a certain transaction occurs that attempts to introduce inconsistent data, the entire transaction is rolled back and an error returned to the user.

Data integrity is the overall completeness, accuracy and consistency of data. This can be indicated by the absence of alteration between two instances or between two updates of a data record, meaning data is intact and unchanged. Data integrity is usually imposed during the database design phase through the use of standard procedures and rules. Data integrity can be maintained through the use of various error-checking methods and validation procedures.

A.1.5 Define the term database transaction.
A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions.

Transactions in a database environment have two main purposes:

1.To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.

2.To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.

A database transaction, by definition, must be atomic, consistent, isolated and durable.

Database practitioners often refer to these properties of database transactions using the acronym ACID.

A transactional database is a DBMS where write transactions on the database are able to be rolled back if they are not completed properly (e.g. due to power or connectivity loss).

Most modern relational database management systems fall into the category of databases that support transactions.

In a database system a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database.

Users of database systems consider consistency andintegrity of data as highly important.

A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following:

Begin the transaction Execute a set of data manipulations and/or queries If no errors occur then commit the transaction and

end it

If errors occur then rollback the transaction and end it If no errors occurred during the execution of the transaction then the system commits the transaction.

A transaction commit operation applies all data manipulations within the scope of the transaction and persists the results to the database.

If an error occurs during the transaction, or if the user specifies a rollback operation, the data manipulations within the transaction are not persisted to the database.

In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state.

Internally, multi-user databases store and process transactions, often by using a transactionID or XID.

There are multiple varying ways for transactions to be implemented other than the simple way documented above. Nested transactions, for example, are transactions

which contain statements within them that start new transactions (i.e. sub-transactions). Multi-level transactions are a variant of nested transactions where the sub-transactions take place at different levels of a layered system architecture (e.g., with one operation at the database-engine level, one operation at the operating-system level) [2] Another type of transaction is the compensating transaction.

A.1.6 Explain concurrency in a data sharing situation.
Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system. If two or more users try to update the contents of a database simultaneously, locks and partitions are put into place to prevent it. Thus enabling greater concurrency.

A.1.7 Explain the importance of the ACID properties of a database transaction.
ACID refers to:  atomicityensures that all operations within the work unit are completed successfully. Otherwise, the transaction is aborted at the point of failure and all the previous operations are rolled back to their former state. consistencyensures that the database properly changes states upon a successfully committed transaction. isolationenables transactions to operate independently of and transparent to each other. durability.ensures that the result or effect of a committed transaction persists in case of a system failure. 

A.1.8 Describe the two functions databases require to be performed on them.
Query functions A query is a request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, e.g., trend analyses from data-mining tools.

ADD, DELETE UPDATE are all functions that can be performed on the database

UPDATE Query is used to modify the existing records in a table. You can use the WHERE clause with the UPDATE query to update the selected rows, otherwise all the rows would be affected.

A.1.9 Explain the role of data validation and data verification.
Data verification is a way of ensuring the user types in what he or she intends, in other words, to make sure the user does not make a mistake when inputting data. ... Validation is about checking the input data to ensure it conforms with the data requirements of the system to avoid data errors. Data validation is the process of ensuring that data is valid. Data validation rules are used in data validation processes to ensure the validity. The term validity of data mostly denotes the meaningfulness and correctness of the data. Data Validation is done on the original document whereas data verification is done on copies. This is the major difference between data validation and data verification.