Glossary - Complete A to Z
Data Migration Glossary This data migration glossary has been split into sections for faster loading. |

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

ASCII
- American Standard Code for Information Interchange is the most common format for text files in computers and on the Internet. In an ASCII file, each alphabetic, numeric, or special character is represented with a number.
AIS
- Automated Information System - A combination of computer hardware and software, data, and communications that perform functions for an organisation.
API
- Application Program Interface - The prescribed method by which a computer program can make requests of an operating system or another application.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Business Rules
- Declarations of constraints on the presentation, storage, and processing or other aspects of entities defined in the environment(s) that an application is intended to support.
Business Process
- A set of interacting activities that produce products or services.
BPR
- Business Process Re-engineering - The act of analysing and optimising the processes required for a company to produce its product.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

C
- A structured, procedural programming language that has been widely used both for operating systems and applications and that has had a wide following in the academic community. Many versions of UNIX-based operating systems are written in C.
C++
- An object-orientated programming language that is viewed by many as the best language for creating large-scale applications. C++ is a superset of the C language. A related programming language, Java, is based on C++ but optimised for use on the Internet. Learn more about C++ from its author Bjarne Stroustrup.
Cleanse (data)
- The act of applying rules, logic and other transformations to both remove bad (invalid) data and enhance the value of good data.
CRM - Customer Relationship Management
- software designed to help companies manage their relationships with their clients in a more efficient manner.
CSV - Comma Separated Value
- A data file which has a physical ASCII file structure that contains records whose values are delimited or separated by commas.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Dip Sample
- The process of taking a small random selection of records from a set of records. This is usually part of quality control. The sample records are examined carefully as an indication of quality for the whole set.
Data
- Distinct pieces of information, usually formatted in a precise way so as to be useful.
Database
- A collection of data organised so that it can easily be accessed, analysed and updated. The current leading databases include Oracle, Sybase and Microsoft's SQL Server. Most modern relational databases can be accessed using a language called SQL.
Data Dictionary
- In database management systems, a file that defines the basic organisation of a database. A data dictionary contains a list of all files or tables in the database, the number of records in each file, and the names and types of each field.
Data Hub
- See Hub
Data Integrity
- See Integrity
Data Migration
- see Migration
Data Mapping
- see Mapping
Data Mart
- see Mart
Data Warehouse
- see Warehouse
Data Weeding
- see Weed
DBMS
- Short for Database Management System, see database.
Distributed Database
- A database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

EAI - Enterprise Application Integration
- A term for the plans, methods, and tools aimed at modernising, consolidating, and coordinating the computer applications in an enterprise. Typically, an enterprise has existing legacy applications and databases and wants to continue to use them while adding or migrating to a new set of applications.
EBCDIC
- A numeric code for alpha-numeric characters developed by IBM, similar in nature to ASCII which has more or less replaced EBCDIC as the normal standard. However many legacy systems and databases still store their data using EBCDIC.
ETL
- The generic name for software which Extracts, Transforms and Loads data.
Enterprise
- an enterprise is an organisation that uses computers.
Extract

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Field
- A space allocated for a particular item of information. In database systems, fields are the smallest units of information you can access. In spreadsheets, fields are called cells.
File
- A collection of data or information that has a name, called the filename. Almost all information stored in a computer must be in a file.
Flat-File Database
- A relatively simple database system in which each database is contained in a single table (or file). In contrast, relational database systems can use multiple tables to store information, and each table can have a different format.
Format
- A specific pre-established arrangement or organisation of data
Full record Print

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Gap Analysis
- Gap analysis is the study of the differences between two different information systems or applications, often for the purpose of determining how to get from one state to a new state. A gap is sometimes spoken of as "the space between where we are and where we want to be." Gap analysis is undertaken as a means of bridging that space.

A - B - C - D - f - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Hub
- A data hub is a software application that received data from multiple sources and feeds it to a central database. On the way in the data is checked for quality and completeness. Problems can be trapped and fixed before the data enters the database. The concept of a hub As opposed to a data warehouse is that a hub continuously interacts with both up and downstream systems.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Index
- AIn a Database indices are used to speed up data access. They are essentially a list of keys or keywords, which identify each unique record. Indices make it faster to find specific records and to sort records on the indexed field(s).
Integrity
- ARefers to the validity of data. Data integrity can be compromised in several ways, including human error when data is entered or errors that occur when data is transmitted from one computer to another.
ISAM - Indexed Sequential Access Method
- A method for managing how a computer accesses records and files stored on a hard disk. ISAM provides direct access to sequentially stored records through an index. This combination results in quick data access.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Java
- Java is a programming language written by Sun Microsystems designed for use on the Internet. It was designed to have the "look and feel" of the C++ language, but to be simpler to use. It enforces an object-orientated programming model. Learn more about Java here.
JDBC - Java Database Connectivity
- An application program interface (API) specification for connecting programs written in Java to the data in popular databases. The API lets you submit statements in Structured Query Language (SQL) that are then passed to the program that manages the database. It returns the results through a similar interface. JDBC is very similar to the SQL Access Group's Open Database Connectivity (ODBC)

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Key
- In a database a key is a field used to sort or speed up access to data. In a relational database records are normally identified by a unique combination of keys or fields.

A - B - C - D - f - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

LINUX
- A UNIX-like operating system that was designed to provide personal computer users a free or very low-cost operating system comparable to traditional and usually more expensive Unix systems. Linux has a reputation as a very efficient and fast-performing system. Learn more about LINUX here.
Load
- the act of adding (or storing) data stored either in a flat file or computer memory into a database or software system.
- The data contained within a legacy system. The data is usually current but restrictive to work on as it is likely to be in an old-fashioned format.
Legacy System
- Jargon for an AIS that is currently in use, and initially deployed many years ago, using a computing infrastructure that is several generations old. These systems tend to be critical to the business and cannot be easily replaced or cost-effectively maintained. They are approaching or have reached the end of their practical operational life span.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Mapping
- A mapping is the relationship between source and target entities. It is an inclusive term than be applied at different levels, from a single field to a complete system.
Mart
- A data warehouse combines databases across an entire enterprise, while data marts are usually smaller and more focussed.They are normally populated from the data warehouse and allow customised usage with faster execution than a data warehouse.
Metadata
Middleware
- Software designed to establish a relationship (including filtering and transformation) between systems.
Migration
- The process of transferring all or part of an AIS's data to another technical infrastructure. The original application may be upgraded or replaced. The business data (and its schema) usually are retained in a significant way.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Normalise
- In relational database design the process of organising data to avoid duplication is called Normalisation. It involves dividing a database into two or more tables and defining relationships between them. The objective is to isolate data so that modifications of any field can be made just once and then propagated through the rest of the database via the defined relationships. This process reduces storage requirements and ensures data integrity.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

ODBC - Open DataBase Connectivity
- A standard API for accessing a database. Most of the major databases support ODBC which is used by embedding SQL statements into a program and making requests of the database using defined ODBC procedures.
OLTP - On-Line Transaction Processing
- OLTP is a class of program that facilitates and manages transaction-orientated applications, typically for data entry and retrieval transactions in a number of industries.
OOP - Object Orientated Programming
- A revolutionary concept that changed the rules in computer program development, object-orientated programming is organised around "objects" rather than "actions". The aim is produce code which is more robust and more re-usable than is possible with sequential programming techniques. C++ and Java are the most popular object-orientated languages today.

A - B - C - D - f - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

PAF - Postcode Address File
- A database file enabling address to be determined from a post code.
Pump
- A data pump extracts data from a source system, may perform some filtering and transformation, and loads into a target system(s).

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Quality Assurance - See also Data Cleansing
- A planned and systematic set of actions to provide adequate confidence that work products and the processes used to produce them conform to established requirements.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

RDBMS - Relational DataBase Management System
- See relational database.
Record
- Records are composed of fields, each of which contains one item of information. A set of records constitutes a file, or in a relational database, a table.
Relational Database
- A database that stores data in the form of related tables. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. An important feature of relational systems is that a single database record can be spread across several tables
Replication
- The copying of data from one database to another. In data warehousing replication takes place as data is moved from the on-line transaction processing system into the data warehouse. Replication also takes place if a data mart is being populated with data from the data warehouse.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Schema
Scrub (data)
- see Cleanse.
SGML - Standard Generalised Markup Language
- An internationally agreed standard for information representation. SGML can be used to produce files which can be read by people, and exchanged between machines and applications in a straightforward manner.
Source system
- The system which currently contains the data.
Spreadsheet
- A spreadsheet is a sheet of paper that shows accounting or other data in rows and columns. A spreadsheet also refers to a computer program that simulates a physical spreadsheet by capturing, displaying, and manipulating data arranged in rows and columns.
SQL - Structured Query Language
- Pronounced 'see-kwull' - A standardised query language for requesting information from a database.

A - B - C - D - f - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Table
- Refers to data arranged in rows and columns. A spreadsheet, for example, is a table. In relational database management systems, all information is stored in the form of tables.
Transaction
- A transaction usually means a sequence of information exchange and related work (such as database updating) that is treated as a unit for the purposes of satisfying a request and for ensuring database integrity. An example of a typical transaction might be the purchasing of a ticket.
Transform
- to modify data. This is nearly always required when performing a data migration or data cleansing. An example might be modifying all surnames so that they start with a capital letter. Some transformations can require very sophisticated processing.
Translate
- to convert one value or code to another using a lookup table or business rules.
Target System

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

UNIX
- An operating system written at Bell Labs. The name is a pun based on an earlier system, Multics. Unix was written in the C language, which was invented hand-in-hand with UNIX. UNIX source code was distributed freely to universities (due, at least in part, to US trade restrictions on ATT Bell). Consequently UNIX became the first "open" system that could be improved or enhanced by anyone. There is now a freely available UNIX-based operating system named LINUX.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Validation
VLDB
- Very Large DataBase.
VMS
- The VMS operating system is a proprietary operating system manufactured by the company Digital Equipment Corporation (DEC), who were purchased by Compaq - who in turn were puchased by Hewlett Packard. VMS used to run exclusively on VAX processors, but newer machines of DEC can also run VMS on ALPHA processors (also called AXP processors). The latter can also run the OSF/1 flavour of UNIX. VMS is called OpenVMS today. You can learn more about VMS here.

A - B - C - D - f - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Warehouse
- A permanent database formed of a collection of data extracted and abstracted from source systems
Weed (data)
- Removing Unnecessary, unused or unwanted data either for Legal, efficiency or clarity reasons

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

XML - EXtensible Mark-Up Language
- A pared-down version of SGML, designed for web documents. It allows the creation of customized tags, enabling the definition, transmission, validation, and interpretation of data.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

There are currently no terms in the glossary beginning with 'Y'.

A - B - C - D - E - E - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

There are currently no terms in the glossary beginning with 'Z'.

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

The following websites were useful when compiling this page.