A Refresher Mathematics

In this appendix, we go through a few mathematical concepts useful in understanding statistics. You can read this in one session or refer to specific sections as need be.

Concepts we intend to refresh on are:

  1. Ratios, Proportions, Percentages and Rates
  2. Introduction to Set theory
  3. Basic algebra
  4. Operations on Polynomials
  5. Factoring Polynomials
  6. Scientific Notation
  7. Rational Exponents and Radicals
  8. Linear Equations and Inequalities in One Variable
  9. Quadratic Equations
  10. Other Pre-Calculus Topics
  11. Elementary Functions
  12. Graphs and Transformations
  13. Introduction to Calculus

A.1 Ratios, Proportions, Percentages and Rates

Ratios, proportions, percentages and rates are some of the most widely used mathematical concepts in reporting statistical outputs. They involve simple calculations but if not well understood they might lead to mis-reporting. It is for this reason that we go through these concepts in this section.

A.1.1 Ratios

Ratios are basically comparison of two numbers called terms, for example a comparison of the number of girls to boys in a classroom. They can also be viewed as a relationship between two numbers.

Ratios are expressed as \(a \text{ to } b\), \(a \text{ per } b\), \(a:b\) or simply as a fraction.

As an example, let us report number of female smokers to male smokers from this data.

(tab1 <- table(tips$sex, tips$smoker))
##         
##          No Yes
##   Female 54  33
##   Male   97  60
m <- tab1[1, 2]/tab1[2, 2]

Here are ways of reporting:

  • There are 33 female smokers to 60 male smokers, or even better
  • There is one female smoker for every 1.55 male smoker

A.1.1.1 Proportions

Proportions are fractions/parts of a whole. In statistics, proportions are used to quantify or determine representation of a given category/observations in a variable. For example, a teacher in a class would be interested in knowing fraction/proportion of girls/boys in a class.

Given data shown below, let us try and compute proportions of each gender by their smoking habits.

(gender.smoker <- table(tips$sex, tips$smoker))
##         
##          No Yes
##   Female 54  33
##   Male   97  60

We should get this

pt <- prop.table(table(tips$sex, tips$smoke)); pt
##         
##                 No       Yes
##   Female 0.2213115 0.1352459
##   Male   0.3975410 0.2459016

A.1.1.2 Percentages

Percentages express part of a whole as a part of 100, simply put they are proportions multiplied by 100.

From our previous example, we can get percentages by multiplying proportions by 100.

perc <- pt * 100; perc
##         
##                No      Yes
##   Female 22.13115 13.52459
##   Male   39.75410 24.59016

A.1.1.3 Rates

Rates are ratios (comparison of two values) whose terms are measured in different units.

For example, in athletics, we could measure athletes speed in terms of distance covered by time like 52 kilometers for four hours (52km per 4hrs).

Rates can be reduced to unit rate which are rates expressed as quantity of 1. In our example, athletes speed can be noted as 13km per hour (13km/h).

A.2 Introduction to Set Theory

In our probability chapter, we will use a lot of set notation and concepts. For that reason in this section we refresh on elementary concepts of set theory. We shall review two issues:

A.2.1 Set Properties and Notation

A set is a collection of objects or things. Sets are denoted by capital letters such as A, B and C.

Objects in a set are called elements or members of a set. \(\in\) notation is used to show an element is a member of a set, for example \(c \in A\) which means \(c\) is an element in set \(A\). \(\notin\) is used to show an element is not a member of a given set.

A set without any element is referred to as an empty or null set. For example a set of all negative values in 1:10. Empty sets are denoted with \(\varnothing\) notation.

To refer to a complete set, it’s elements are enclosed in curly brackets for example “{1, 2, 3, 4, 5}” for set with numbers 1 through five.

Other than listing all values in a set, a rule can be used to indicate properties of a set it contains, for instance a set of all values above a given number.

Let us expound on this, suppose we had data on heights of children and we wanted only those above a certain height, we can express this in set notation by using a set builder like this:

\[\{height | height_{x} > 3ft 2inc\}\]

Where:

{} are used to denote a set (of variable height) | means “such that” (in other setting “|” can be replaced with “:”)

This expression is read as, “a set of all heights such that height is greater than 3 feet and 2 inches”. Therefore variable height must contain height above 3ft 2inc.

For each rule a listing can generated, listing are possible elements meeting rule condition. If a listing continues indefinitely then “…” can be used to show this pattern of continuity. For example:

Rule Listing
{x x is an alphabetical letter}
{x x^2 = 9}
{x x is an even number}

First two examples are referred to as finite sets (elements can be counted and there is an end) while the second is referred to as infinite sets (there is no end to counting elements).

If each element in a set is in another set for example all elements of set A are in set B, then set A is a subset of set B. Note, a set can also be a subset to it’s self, in this case they are said to be equal. Symbols used to denote these relationships are \(\subset\) for subset, \(=\) for equality (two set have equal elements), \(\notin\) for not a subset, and \(\neq\) for two sets without same elements.

It’s been been proven a null or empty (\(\varnothing\)) set is a subset of all sets though this proof is beyond scope of this section.

Set of all elements under consideration is called a universal set denoted by \(U\).

A.2.2 Set Operations

There are four basic set operations, these are: + Union + Intersection + Complement and + Difference

Set operations are best shown using a venn diagram. A Venn diagram is a display showing all possible logical relationships between a finite collection of different sets. These diagrams consist of overlapping circles within a rectangle. Overlapping area indicates similar elements and rectangle indicates universal set.

Union set

Union of sets is a combination of all elements of sets under consideration. For example, union of set A with elements {a, b, c} and set B with elements {c, d, e} is {a, b, c, d, e}. Note we only have unique values in output of a union.

Symbolically, A union B can be shown as \(A \cup B\), where \(\cup\) denotes union.

Diagrammatically this can be shown as:

Shaded area: A union B

Shaded area: A union B

Intersection

Intersect is a set of elements that are all sets of interest, basically a set of similar non-unique elements. Symbolically this can be shown as; \(A \cap B\) where \(\cap\) means intersection.

Diagrammatically this can be show as:

Shaded area: A intersect B

Shaded area: A intersect B

Sets A and B are said to be disjoint if they share no similar element or \(A \cap B = \varnothing\).

Disjointed sets

Disjointed sets

Complement

A complement is a set of elements not contained in a set of interest. For example, a universal set \(U\) contains all elements, and set \(A\) contains a few elements from this universal set, all elements in \(U\) and not in \(A\) are complements of \(A\). Complement set are denoted with \('\) for instance \(A'\) which is read as “\(A\) prime”.

Using a Venn diagram this can be shown as:

Shaded area: complement set

Shaded area: complement set

Difference Set

A set containing only elements not contained in another set; unique elements. For example \(A - B\) are all elements in \(A\) not contained in \(B\).

Shaded area: A - B

Shaded area: A - B

A.3 Basic Algebra

Understanding probability theory requires some basic knowledge of algebra which we will use to compute different probabilities. In this regard, in this section we shall look at core concepts in algebra like:

A.3.1 Set of real numbers

A number system is a writing system used to express numbers; they are mathematical notations for representing numbers of a given set.

There are several number systems but most often used number system is the “real number” system.

A real number can be viewed as any number with a decimal representation. Table below shows set of all real numbers and some important subsets.

Symbol Name Description Examples
\(\mathbb{N}\) Natural numbers Counting numbers (also positive integers) 1, 2, 3, …
\(\mathbb{Z}\) Integers Natural numbers, their negatives, and 0 …, -2, -1, 0, 1, 2, …
\(\mathbb{Q}\) Rational numbers Numbers which can be represented as a/b, where a and b are integers and b \(\neq\) 0; decimal representations are repeating or terminating -4, 0, 1, 25, \(\frac{-3}{5}\), \(\frac{2}{3}\), 3.67, -0.33\(\overline{3}\), 5.2727\(\overline{27}\)
\(\mathbb{I}\) Irrational numbers Numbers which can be represented as non-repeating and non-terminating decimal numbers \(\sqrt{2}\), \(\pi\), 1.414213…, 2.71828182…
\(\mathbb{R}\) Real numbers Rational and irrational numbers

Source: Barnett, R.A.

A.3.2 Real number line

All real numbers can be positioned as a point on a line referred to as a “real number line”. Each point on a real number line corresponds to one real number, this real number is called a coordinate of the point.

Origin is the point with coordinate 0. Left side of a real number line are “positive real numbers” while on the right side are “negative real numbers”. Origin 0 is neutral as it is neither positive nor negative.

Real number line

Real number line

A.3.3 Real number properties

In order to convert algebraic expressions into equivalent forms, some basic properties of real number system are necessary. These properties will be especially useful when discussing calculus.

Here we shall be reviewing four basic properties of a set of real number numbers, these are:

  • Associative property
  • Commutative property
  • Identity property and
  • Inverse property

Associative refers to “grouping” or “regrouping” elements (note, this does not mean simplification). Commutative refers to how elements are moved around. An identity is a number which when added to another number equals to the same number. Inverse means opposite or reverse; an inverse is also another number on the real number line when combined on the left or right through operations (+ or *) outputs an identity value.

Under each of these properties, we look at addition, multiplication, and distributive (combination of multiplication and addition) operations.

As examples, we shall use \(a, b, \text{ and } c\) as arbitrary elements in a set of real numbers \(\mathbb{R}\).

Addition Properties

Associative

When elements are grouped or regrouped in an addition computation, output remains the same. That is, whichever way these elements are grouped, output will remain constant.

\[\therefore a + (b + c) = (a + b) + c\]

Commutative

Commutative property of addition states that order of elements does not matter as it results to same output.

\[\therefore a + b = b + a\]

Identity

Here we are looking for a real number (identity) when added to another number results to that number; this number is zero.

\[\therefore 0 + a = a + 0 = a\]

Inverse

Additive inverse is “subtraction”, so for “a” a real number, it’s inverse is “-a”.

\[\therefore a + (-a) = (-a) + a = 0\]

Multiplicative Properties

Associative

Just like additive associations, grouping or regrouping elements of a multiplicative operation results in the same output.

\[\therefore a(bc) = (ab)c\]

Commutative

Like commutative property of addition, order of a multiplication operation on elements of a real number line results to same output.

\[\therefore ab = ba\]

Identity

Identity for multiplication is 1

\[\therefore (1)a = a(1) = a\]

Where \(a\) is any real \(\mathbb{R}\) number

Inverse

Multiplicative inverse (or reciprocal) is “division”, so for \(a\) a real number, it’s multiplicative inverse is \(1/a\). Note, “a” cannot be 0 as zero is not defined: 0 cannot be a divisor.

\[\therefore a(1 \div a) = (1 \div a)a = 1\]

Distributive Properties

Used when an operation involves both addition and multiplication. This property can also be referred to as “distributive property of multiplication over addition”.

This property means that a term multiplied by other terms in parenthesis, simplification should be performed by “distributing” multiplication over terms in parenthesis.

\[\therefore a(b + c) = ab + ac\]

also

\[(a + b)c = ac + bc\]

It is worth noting that, relative to addition, commutativity and associativity are used to change order of addition as well as insert or remove parenthesis as need be. However, the same cannot be done for subtraction and division.

A.3.4 Additional Properties

Using preceding operations (addition and multiplication), their subtraction and division can be expressed as:

Subtraction

For any real number a and b;

\[a - b = a + (-b)\]

Division

For any real number \(a\) and \(b\) and where \(b \neq 0\);

\[\frac{a}{b} = a(\frac{1}{b})\]

Zero Properties

For all real numbers \(a\) and \(b\):

  1. \(a * 0 = 0\)
  2. \(ab = 0 \text{ if and only if } a = 0 \text{ or } b = 0\)

A.3.5 Fraction properties

Division in the form \(a \div b\) and where \(b \neq 0\) can be written as \(\frac{a}{b}\). Top part of this division (element \(a\)) is called numerator and bottom part is called denominator.

\[\frac{\text{numerator}}{\text{denominator}}\]

A.4 Operations on Polynomials

In this section we refresh on one of the most frequently used mathematical concepts, and that is Polynomials. We shall discuss how to work with polynomials which form a core basis in most statistical models. But before that we re-look at exponents and specifically natural number exponents which we will use in our polynomials.

Here are the core concepts we will be reviewing:

  1. Natural Number Exponents
  2. Polynomials
  3. Shape of Polynomials
  4. Combining like terms
  5. Addition and subtraction
  6. Multiplication
  7. Combined operations

A.4.1 Natural Number Exponents

Repeated multiplication of natural numbers (counting numbers or positive integers) \(\mathbb{N}\) are often simplified by exponents. Exponents are real numbers \(\mathbb{R}\) of multiplication repetitions or multiplication factor. Exponents are also called powers.

For any natural number \(a\) multiplied by itself \(x\) times can be expressed as:

\[a^x\]

Where:

  • \(a\) is called a base (natural number being multiplied) and
  • \(x\) is called an exponent (multiplication factor)

\(a^x\) is read, “\(a\) raised to the exponent of \(x\)”.

Two often used exponents are two and three which is base multiplied by itself twice or thrice.

Example:

Exponential form Expanded form Output
\(2^2\) 2 x 2 4
\(2^3\) 2 x 2 x 2 8
\(2^4\) 2 x 2 x 2 x 2 16

A.4.1.1 Exponents

  • They tell us how many times a natural number should be multiplied
  • A negative exponent means divide (inverse of multiplication)
  • A fraction of an exponents like 1/n means taking nth root e.g \(7^{\frac{1}{2}}\) = \(\sqrt{7}\) and \(21^{\frac{1}{3}}\) = \(\sqrt[3]{21}\)

Natural Sequence of Exponents

Rule Example
\(a^1 = a\) \(3^1\) = 3
\(a^0 = 1\) \(3^0\) = 1
\(a^{-1} = \frac{1}{a}\) \(3^{-1}\) = 0.3333333

First property of Exponents

This is also known as product of exponents properties. It is used to simplify multiplication of two natural number exponents with similar base.

When two exponents with the same base are multiplied, their expanded form is the same as addition of exponents.

For example, \(2^3\) x \(2^3\) can be expanded to (2 x 2 x 2) * (2 x 2 x 2) = 2 x 2 x 2 x 2 x 2 x 2; a total of 6 2’s. This can be simplified to \(2^6\) giving us 64.

\[\therefore a^x * a^y = a^{x+y}\]

If there are constants with the same base, multiply them and then add their exponents. For example:

\[10x^2 * 5x^3 = (10*5)x^{2+3} = 50x^5\]

This leads us to our first (and most important) property of exponents; it states that, for any natural number \(m\) and \(n\), and any real number \(b\):

\[b^mb^n = b^{m+n}\]

The following properties can be reasoned in the same way as above:

  • \(x^m/x^n = x^{m-n}\)
  • \((x^m)^n = x^{mn}\)
  • \((xy)^n = x^ny^n\)
  • \((x/y)^n = x^n/y^n\)
  • \(x^{-n} = 1/x^n\)

A.4.2 Polynomials

Algebraic expressions are numbers (constants/coefficients), symbols (variables like \(x\) and \(y\)) and operators (addition, subtraction, multiplication, division) grouped together to denote a value. Terms are individual objects of an expressions, that is, individual numbers or variables (symbols) or numbers and variables.

A mathematical expression

A mathematical expression

Polynomials are special algebraic expressions consisting of several terms. They are formed by constants, variables and non-negative integers exponents combined with addition, subtraction, and multiplication, but not division.

Examples of polynomials Examples of non-polynomials
\(2x^4 + 3x^7 + 20\) \(\frac{1}{x}\)
\(2xy^2 + 5xy^3 + 2\) \(2x^{-2} + 5x^2\)
\(2x + 3x + 1\) \(\frac{a + b}{a^2 - b}\)
\(5 or 0\)

Polynomials are constructed with one, two, three or more terms, for example a polynomial with:

  • one variable is expressed by adding or subtracting constants and terms of the form \(ax^n\)
  • two variables is expressed by adding or subtracting constants and terms of the form \(ax^my^n\)
  • three variables is expressed by adding or subtracting constants and terms of the form \(ax^my^nz^o\)
  • for more than three variables we use similar pattern as above

A.4.2.1 Classifications of Polynomials by their degree

Degree refers to highest exponent in a polynomial. Highest exponent for a one variable polynomial is simply its highest exponent, but for two or more variables, degree is the largest exponent after totaling exponents of each term. For example, \(6xy^2 + 3xy^4 +2\) is a 5 degree polynomial because first term has an exponent of \(1 + 2 = 3\), second term has an exponent of \(1 + 4 = 5\) and final term which is a constant has an exponent of 0.

Degree can be written as deg(\(6xy^2 + 3xy^4 + 2\)) = 5

Below is a table with names of degrees for equations with one variable.

Names of degrees

Degree Name Example
0 Constant 5
1 Linear \(2x + 10\)
2 Quadratic \(2x^2 + 5\)
3 Cubic \(2x^3 + x + 3\)
4 Quartic \(2x^4 + 2x^2 + 6\)
5 Quintic \(2x^5 + 3x + 2x^2 + 3\)

Note:

  • higher order equations (those with high degree; > 2) are harder to solve
  • Polynomials are often written with highest degree first, this is called standard form
  • polynomials of one variable are easy to plot as they have smooth and continuous lines
  • A single term polynomial is called monomial, a two-term polynomial binomial and three-term polynomial trinomial

A.4.3 Shape of polynomials

Shape of a polynomial’s graph is connected to its degree; for odd-degree polynomials (\(f(x)\) = \(x\) or \(x^3\) or \(x^5\)), with a positive coefficient, graph starts from the negative and ends on the positive and across the x-axis at least once. For even polynomials (\(f(x)\) = \(x^2\) or \(x^4\) or \(x^6\)) with a positive coefficient, graph starts positive and ends in the positive. Even polynomials can cross x-axis once, twice or not all.

Graphs of polynomial functions are continuous meaning they do not have holes or breaks. In addition, these graphs do not have sharp corners as one would expect from a graph of an absolute function.

Each graph of a polynomial with a certain degree has an expected minimum number of vertices. Vertices for this continuous graphs are points separating an increasing portion and a decreasing portion or vice versa.

In general, graph of a polynomial function of a positive degree \(n\) can have at most \((n-1)\) vertices which can cross the x-axis at most \(n\) times.

op <- par("mfrow")
par(mfrow = c(2, 3))

# First-degree polynomial
x <- seq(-5, 5, 0.01)
plot(c(-5, 5), c(-5, 5), type = "n", ylab = FALSE)
lines(x, 0.5*x, col = 4)
title("First-degree polynomial", line = 1)
title(xlab = "x", ylab = "h(x)", line = 2)
title(sub = "n = 1, therefore 0 vertices", line = 3)
text(4.2, -1.4, labels = expression(paste("f(x) = ", 0.5*x)), srt = 90)

# Third-degree polynomial
third <- expression(x^3 - 2*x)
third_vertices <- c(-sqrt(2/3), sqrt(2/3))
x <- sort(c(third_vertices, seq(-2, 2, 0.01)))
plot(c(-5, 5), c(-5, 5), type = "n", ann = FALSE)
lines(x, eval(third), col = 4)
title("Third-degree polynomial", line = 1)
title(xlab = "x", ylab = "j(x)", line = 2)
title(sub = "n = 3, therefore 2 vertices", line = 3)
points(third_vertices, y = eval(third)[which(x %in% third_vertices)], pch = 21, bg = 4)
text(4.2, 0, labels = expression(paste("j(x) = ", x^3 - 2*x)), cex = 0.9, srt = 90)


# Fifth-degree polynomial
fifth_vertices <- c(-1.64443286, -0.5439123, 1.64443286, 0.5439123)
x <- sort(c(fifth_vertices, seq(-2, 2, 0.01)))
fifth <- expression(x^5 - 5*x^3 + 4*x + 1)
plot(c(5, -5), c(5, -5), type = "n", ann = FALSE)
title("Fifth-degree polynomial", line = 1)
title(xlab = "x", ylab = "f(x)", line = 2)
title(sub = "n = 5, therefore 4 vertices", line = 3)
lines(x, eval(fifth), col = 4)
x <- fifth_vertices
points(fifth_vertices, eval(fifth), pch = 21, bg = 4)
text(4.2, 0, labels = expression(paste("f(x) = ", x^5 - 5*x^3 + 4*x + 1)), cex = 0.8, srt = 90)


# Second-degree polynomial
x <- seq(-2, 2, 0.01)
second <- expression(x^2 - 2)
plot(c(-4, 4), c(-4, 4), type = "n", ann = FALSE)
title("Second-degree polynomial", line = 1)
title(xlab = "x", ylab = "H(x)", line = 2)
title(sub = "n = 2, therefore 1 vertex", line = 3)
lines(x, eval(second), col = 4)
points(0, -2, pch = 21, bg = 4)
text(3.7, 0, labels = expression(paste("H(x) = ", x^2 - 2)), cex = 0.9, srt = 90)

# Fourth-degree polynomial
fourth <- expression(2*x^4 - 4*x^2 + x - 1)
fourth_prime <- expression(8*x^3 - 8*x + 1)
fourth_vertex <- c(-1.0574538, 0.1270510, 0.9304029)
x <- sort(c(fourth_vertex, seq(-1.7, 1.6, 0.01)))
plot(c(-5, 5), c(-5, 5), type = "n", ann = FALSE)
title("Fourth-degree polynomial", line = 1)
title(xlab = "x", ylab = "J(x)", line = 2)
title(sub = "n = 4, therefore 3 vertices", line = 3)
lines(x, eval(fourth), col = 4)
x <- fourth_vertex
points(x, eval(fourth), pch = 21, bg = 4)
text(4.2, 0, labels = expression(paste("J(x) = ", 2*x^4 - 4*x^2 + x - 1)), cex = 0.8, srt = 90)

# Sixth-degree polynomial
sixth <- expression(x^6 - 7*x^4 + 14*x^2 - x - 5)
sixth_prime <- expression(6*x^5 - 28*x^3 + 28*x - 1)
sixth_vertices <- c(-1.777750, 0.035760, 1.807227, -1.237497, 1.172260)
x <- sort(c(seq(-2.3, 2.3, 0.01), sixth_vertices))
plot(c(-5, 5), c(-5, 5), type = "n", xlab = "x", ylab = "")
lines(x, eval(sixth), col = 4)
x <- sixth_vertices
points(sixth_vertices, eval(sixth), pch = 21, bg = 4)
title("sixth-degree polynomial", line = 1)
title(xlab = "x", ylab = "F(x)", line = 2)
title(sub = "n = 6, therefore 5 vertices", line = 3)
text(4.2, 0, labels = expression(paste("F(x) = ", x^6 - 7*x^4 + 14*x^2 - x - 5)), cex = 0.7, srt = 90)


par(mfrow = op)

A.4.4 Combining like terms

Like terms are terms with similar variables and exponents but they could have different coefficients (constant preceding a term). For example \(10x\) and \(6x\) are like terms.

Note, if a term has no constant before a variable, then coefficient is understood to be 1. If no constant appears but a negative (-) sign appears in front, then it is understood to be -1. Example: \(5t^3 - t^3 + 6\) has coefficients, 5, -1, and 6

There are some distributive properties which are necessary for the process of combining like terms, these are:

  1. a(b + c) = (b + c)a = ab + ac
  2. a(b - c) = (b - c)a = ab - ac
  3. a(b + c + … + f) = ab + ac + … + af

Now let’s do one example of combining like terms:

\[10xy^2 + 2xy^2 + xy^2 + xy + 3\]

Like terms in this example are our first three terms: \(10xy^2\), \(2xy^2\), and \(xy^2\). \(xy\) is not a like term as \(y\) does not have exponent 2.

\[\therefore 10xy^2 + 2xy^2 + xy^2 + xy + 3 = (10xy^2+2xy^2+xy^2) + xy + 3 = 13xy^2 + xy + 3\]

Note:

  • Where parenthesis are present, we begin by clearing expressions in parenthesis using distributive properties then combine like terms. For example \(9(x^2 + y^2) - 3(2x^2 - 3y^2)\) can be simplified to \(3x^2 + 18y^2\)
  • Always work with signs, it can either be positive or negative (except for 0 which is sign-less)

A.4.5 Addition and subtraction

Additions and subtractions of polynomials involves removing parentheses and combining like terms.

Let’s add the following three polynomials as our example:

\[5x^2 - 2x + 6 \\ 2x^3 +x + 3 \\ -x^3 - 2\]

  1. Additional arrangement \[(5x^2 - 2x + 6) + (2x^3 + x + 3) + (-x^3 - 2)\]

  2. Remove parentheses (factoring in signs)

\[5x^2 -2x + 6 + 2x^3 + x + 3 - x^3 - 2\]

  1. Putting like terms together (from highest exponent)

\[2x^3 - x^3 + 5x^2 - 2x + x + 6 + 3 - 2\]

  1. Simplify like terms

\[x^3 + 5x^2 - x + 7\]

Subtraction of polynomials follows similar procedures.

A.4.6 Multiplication

Multiplication of algebraic expressions like polynomials, requires extensive use of distributive properties for real numbers as well as other real number properties.

For this, we shall use the following two polynomials:

\[(3x^3 - 2x^2)(9x^3 + x^2 + 5)\]

We multiply first term with all terms in the second polynomial then second term with all second polynomial’s terms. This should result in:

\[27x^6 + 3x^5 + 15x^3 -18x^5 - 2x^4 - 10x^2\]

Putting like terms together simplifies it to:

\[27x^6 - 15x^5 - 2x^4 + 15x^3 - 10x^2\]

Note:

  • Products of binomials (two-term polynomials) factors occur frequently thus some handy formulas for their products have been given

Special Products

  1. \((a - b)(a + b) = a^2 - b^2\)
  2. \((a + b)^2 = a^2 + 2ab + b^2\)
  3. \((a - b)^2 = a^2 - 2ab + b^2\)

A.4.7 Combined operations

For combined operations, polynomials will often have several grouping using different symbols like parentheses “()”, brackets “[]” and curly braces “{}”.

To simplify these polynomials, it is best to remove these grouping symbols from inside, that is, from “()” to “[]” and finally “{}”.

In terms of operations precedence, multiplication and division precede addition and subtraction while taking exponents precedes multiplication and division.

As an example, let’s simplify this polynomial:

\[2 + \{4x^2 - [4x^3 - 2x^2(x + 3)]\}\]

Begin by removing inner “()”

\[2 + \{4x^2 - [4x^3 - 2x^3 - 6x^2]\}\]

Remove [] (multipling by -1)

\[2 + \{4x^2 - 4x^3 + 2x^3 + 6x^2\}\]

Remove “{}”

\[2 + 4x^2 - 4x^3 + 2x^3 + 6x^2\]

Now we simplify

\[-2x^3 + 10x^2 + 2\]

A.5 Factoring Polynomials

In this section we look at concepts of factoring polynomials which can be quite handy in simplification and graphing.

We will specifically look at:

A.5.1 Common Factors

This is an initial process of factoring and it involves factoring out factors common in all terms.

Example

Given

\[6z^2w^3 + 3z^4w^2 - 9z^2w^2\]

we can factor out a common factor which is \(3z^2w^2\) giving us:

\[3z^2w^2(2w + z^2 - 3)\]

A.5.2 Factoring by grouping

Other than factoring out common factors, terms can be grouped in such a way that it make it efficient to complete factoring process for polynomials. There is no rule of the thumb here, but it is important to take into account sign of each term.

Example

Given this function:

\[6z^2 + 3z - 4z - 2\]

we can group it’s terms as:

\[3z(2z + 1) - 2(2z + 1)\]

Which become:

\[(3z - 2)(2z + 1)\]

If we multiplied these groups we should get our original function \(6z^2 + 3z - 4z - 2\).

A.5.3 Factoring Second-Degree Polynomial

Second-degree polynomials widely used in statistical models. Some of these polynomials can be simplified to first-degree polynomials with integer coefficients which makes it handy for a number of issues including determining points where \(y = f(x) = 0\).

Since not all second-degree polynomials can be transformed to two first degree polynomials, then it is good to start off by checking if it is possible to transform them. We do this using a factorability evaluation called ac Evaluation.

For a polynomial

\(ax^2 + bx + c\) or \(ax^2 + bxy + cy^2\),

we can determine if it has first-degree factors with integer coefficients by:

  1. taking a product of \(a\) and \(c\), that is \(ac\) and
  2. look for two factors of \(ac\) which sum up to \(b\) (coefficient of the second term)

If these two factors exist, then polynomial has first-degree factors with integer coefficients and we can label these two factors as \(p\) and \(q\).

Basically this,

\[pq = ac \qquad{} \text{ and } \qquad{} p + q = b\]

must be satisfied.

Therefore, once we know \(p\) and \(q\) exist, then we can use our “factoring by grouping” knowledge to formulate these two first degree polynomials.

Example

Given \(9z^2 + 80z - 9\), we begin by checking if we have \(p\) and \(q\) such that \(pq\) equals \(ac\).

In this example, \(a = 9\) and \(c = -9\), thus \(ac = -81\). Two factors which sum to \(80\) are \(-1\) and \(81\), we therefore have \(p\) and \(q\) and as such we can factor it out using integer coefficients.

We do this by substituting \(b\) with \(p\) and \(q\), grouping them and then factoring out common factors.

\[9z^2 - z + 81z - 9\]

\[(9z^2 - z) + (81z - 9)\]

\[z(9z - 1) + 9(9z - 1)\]

\[(z + 9)(9z - 1)\]

Again if we multiplied \((z + 9)\) with \((9z - 1)\) we should get our original polynomial \(9z^2 + 80z - 9\).

A.5.4 Special Factoring Formulas

There are special factoring formulas generated to ease process of factoring certain polynomials which appear frequently these are:

  1. Perfect square: \(u^2 + 2uv + v^2 = (u + v)^2\)
  2. Perfect square: \(u^2 - 2uv + v^2 = (u - v)^2\)
  3. Difference of squares: \(u^2 - v^2 = (u - v)(u + v)\)
  4. Difference of cubes: \(u^3 - v^3 = (u - v)(u^2 + uv + v^2)\)
  5. Sum of cubes: \(u^3 + v^3 = (u + v)(u^2 - uv + v^2)\)

Notice pattern being formed by differences, we are multiplying one first degree difference with it’s \(n - 1\) degree expanded expression. Therefore we can write difference of a fifth exponent as:

\[u^5 - v^5 = (u - v)(u^4 + u^3v + u^2v^2 + uv^3 + v^4)\]

Examples

  1. \(9z^2 - 4y^2\) is the same as \((3z)^2 - (2y)^2\) which can be factored out with difference of squares \((3z - 2y)(3z + 2y)\)
  2. \(6(z - 2)^2 - 4y^2\) can be factored to \([3(z - 2) - 2y][3(z - 2) + 2y]\)

A.5.5 Factoring polynomials with rational zero’s theorem

Factoring polynomials of higher degree (> 3) can become quite challenging especially when using techniques already discussed. For that reason it might be good to use other methods.

One method is Rational Zeros Theorem. This theory basically locates all \(x\) values which equate given function to zero. These \(x\) values are are called roots of a function.

This theory uses coefficients of highest term and last term (constant). Its reasoning is that, roots of a function will often be a ratio of a factor of it’s constant and it’s leading coefficient. Symbolically we can express these possible roots of a function as \(\frac{p}{q}\) where \(p\) is a factor of it’s constant and \(q\) is a factor of it’s leading coefficient. Take note not all \(\frac{p}{q}\) will lead to a root, therefore we need to determine which among them yields a root.

To do this we need to do four things:

  1. Arrange polynomial in a decreasing order, that means having highest degree term first and constant last. It also means that all degree terms must be given; for those that are not there a zero term can be added like \(0x^3\).
  2. Determine all factors of constant and leading coefficient, these includes their negative values.
  3. Compute all combinations of \(\frac{p}{q}\) and eliminate any duplicates 1. Use division method to determine \(\frac{p}{q}\)’s that are roots.

Let us go over one example to grasp this concept.

Example

Given

\[j(x) = -5x^4 - 4x^3 + 42x^2 + 12x - 45\]

we want to find all its roots. From our basic algebra we know these must total to 4 since it is a fourth-degree polynomial.

Our initial activity is to order our polynomial and include 0 terms where needed. Since our polynomial is in good order, then we can proceed to our next activity.

From our equation \(p\) is -45 and \(q\) is -5, factors of \(p\) are 1, 3, 5, 9, 15, 45 and their negatives. Factors of \(q\) are 1, 5 and their negatives.

Now we need to get all unique combinations of \(\frac{p}{q}\). These are:

p <- c(1, 3, 5, 9, 15, 45)
q <- c(1, 5)
possible_vals1 <- expand.grid(p, q)
possible_vals2 <- expand.grid(p, -1 * q)
possible_vals <- rbind(possible_vals1, possible_vals2)
p_over_q <- possible_vals[,1]/possible_vals[,2]
p_over_q <- unique(p_over_q)
p_over_q
##  [1]   1.0   3.0   5.0   9.0  15.0  45.0   0.2   0.6   1.8  -1.0  -3.0
## [12]  -5.0  -9.0 -15.0 -45.0  -0.2  -0.6  -1.8

Our final step is to determine which among these \(\frac{p}{q}\) are roots. We shall do this by dividing our function by a one degree polynomial formed by each of these \(p/q\)’s. Therefore we will have our equation as our divided and each of these one degree polynomials will be our divisor. Idea here is to determine which outputs a zero remainder. We should also note that dividing by a one degree polynomial leads to divided being a polynomial of a lesser degree. By reducing these polynomials we are left with polynomials which we can solve for \(x\) using previously discussed methods.

For our initial division, we will have our dividend as

\[-5x^4 - 4x^3 + 42x^2 + 12x - 45\]

and our divisor for \(\frac{p}{q} = 1\) as \(x - 1\)

We can now determine its quotient and remainder as we do with any other division.

\[x - 1 )\overline{-5x^4 - 4x^3 + 42x^2 + 12x - 45}\]

Core idea about this division is to determine terms which output zero when subtracted from a divided’s term but divisible by leading term of divisor. Therefore we begin by dividing first term of our divided with first term of our divisor, output should be able to equate first term to zero when it is subtracted. In this case \(-5x^4\) divided by \(x\) is \(-5x^3\) which we place above our division bar.

\[-5x^3\\ x - 1) \overline{-5x^4 - 4x^3 + 42x^2 + 12x - 45}\]

We proceed by multiplying \(-5x^3\) by our divisor \(x + 1\) to get \(-5x^4 + 5x^3\) and place it right below our divided first two terms.

\[-5x^3\\ x + 1) \overline{-5x^4 - 4x^3 + 42x^2 + 12x - 45}\\ -5x^4 + 5x^3\qquad{} \qquad{} \qquad{} \quad{}\]

We follow this by subtracting \(-5x^3 + 5x^3\) from \(-5x^4 - 4x^3\) and place output below line under \(-5x^4 + 5x^3\).

\[-5x^3\qquad{}\qquad{}\qquad{}\qquad{}\qquad{}\\ x + 1) \overline{-5x^4 - 4x^3 + 42x^2 + 12x - 45}\\ -5x^4 - 5x^3\qquad{} \qquad{} \qquad{} \quad{}\\ \overline{\qquad{}\qquad{} -9x^3}\qquad{}\qquad{}\qquad{}\qquad{}\]

If we continues with this pattern then we should obtain a quotient of \(-5x^3 - 9x^2 + 33x\) and a 0 remainder. This means 1 is a zero root of \(j\).

As you must have noticed, doing this division is rather involving but if we take a closer look we will see a pattern to simplify this process.

Two things to take note in this pattern, leading term of our divisor is only used to clear terms in our divided (equating them to zero). The other thing to note is that our variables do not matter in our division as long as they are complete and in a decreasing order. What is of concern to us are coefficients of our divided and second term of our divisor.

Given these facts, we should see that leading term of our quotient (\(-5x^3\)) has same coefficient asleading term of our divided (\(-5x^4\)) but with one degree less. Coefficient of our second quotient (\(-9\)) is a difference of second term of our divided (\(-4\)) and product of leading coefficient and second term of our divisor (\(-5 * -1 = 5\)). Coefficient of third term in our quotient (\(33\)) is a difference of coefficient of third term in our divided and product of second difference (\(-9\)) and second term of our divisor (\(-9 * -1 = 9\)). Fourth term of our quotient (45) is a difference of coefficient of fourth term of our divided (\(12x\)) and product of third difference (33) and second term of our divisor (\(33 * -1 = -33\)).

To make this computation simple to work with, we will do additions rather than difference for our \(p/q\). For clarity, we will form a line with our \(p/q\) on our left and coefficients of our divided on on our right. We can then take totals after taking note that our leading coefficient will always be coefficient of leading term in our divided.

In essence we should have something like this

\[1\rfloor \qquad{}\qquad{} -5\quad{}-4\quad{}42\quad{}12\quad{}-45\quad{}\\ \qquad{}\qquad{}\qquad{} -5\quad{}-9\quad{}33\quad{}45\\ \text{_________________________________}\\ \qquad{}\qquad{} -5\quad{}-9\quad{}33\quad{}45\quad{}0\]

Since we have a remainder zero which is what we got with our division, then this reasoning is correct and we can do this with other \(p/q\).

But before running through all other \(p/q\)’s , let us appreciate a few facts from this output. A remainder zero means we have reduced our fourth degree by 1 thus becoming a third-degree polynomial.

\[h(x) = -5x^3 - 9x^2 + 33x + 45\]

Something else to note is that we can take our new function \(-5x^3-9x^3+33x+45\) and use other methods to locate zeros. But with efficiency of computer programs, we can easily run through all our \(p/q\)’s using our simplified method and establish -3 is also a root \(j\).

We are therefore left with two other roots to determine.

coeffs <- c(-5, -4, 42, 12, -45)
n <- length(p_over_q)
remainder <- sapply(1:n, function (i) 
   ((((coeffs[1]*p_over_q[i]) + coeffs[2]) *
   p_over_q[i] + coeffs[3]) *
   p_over_q[i] + coeffs[4]) *
   p_over_q[i] + coeffs[5])
remainder
##  [1]         0.000      -144.000     -2560.000    -32256.000   -257040.000
##  [6] -20782080.000       -40.960       -24.192        36.864       -16.000
## [11]         0.000     -1680.000    -26640.000   -230400.000 -20054160.000
## [16]       -45.696       -36.864        40.320
zeros1 <- p_over_q[which(remainder == 0)]

To get these last two roots we need to use our new function \(h\). This function has different coefficients (-5, -9, 33, and 45) but has same \(p/q\) since constant and leading coefficents are similar. Therefore running through all our \(p/q\)’s again we get -3 as a root of \(h\).

Since we did not get our two last roots, we can use reduced function from running -3.

This new function is a second-degree polynomial

\[-5x^2 + 6x + 15\]

This is now much simpler function to work with as there are quite a number of methods for determining roots of a second degree polynomial. One method is a quadratic (second-degree polynomial) formula which we will discuss later, but for purposes of solving our second-degree polynomial we will mention how it is used.

For a general quadratic equation

\[ax^2 + bx + c = 0 \qquad{} a \ne 0\]

\(x\) can be solve with this quadratic formula

\[x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\]

Therefore, from our quadratic equation \(-5x^2 + 6x + 15\), \(a = -5, \text{ }b = 6 \text{ and } c = 15\).

We substitute this values in our formula

\[x = \frac{-6 \pm \sqrt{-6^2 - 4(-5)(15)}}{2(-5)}\]

x1 <- (-6 + sqrt((-6)^2 - 4*-5*15))/(2*-5)
x2 <- (-6 - sqrt((-6)^2 - 4*-5*15))/(2*-5)
hx_at_zero <- c(-3, -1.23303, 1, 2.43303)

Output from this formula are our last two roots -1.2330303 and 2.4330303.

In conclusion, \(j(x) = 0\) or roots of \(j\) occurs when \(x = -3, -1.2330, 1 \text{ and } 2.43303\)

A.6 Scientific Notation

Large and small numbers are often expressed in exponential form for ease of writing and manipulation. This exponential form is said to be in “Scientific notation”.

Numbers expressed in scientific notation are expressed as:

\[a * 10^x \qquad{} 1 \leqslant a < 10\]

where

  • \(a\) a decimal value and
  • \(x\) is an integer

This means a finite decimal value can be expressed as a product of a number between 1 and 10 and an integer exponent of base 10. Positive exponents means number is greater than or equal to 10, negative exponent means number is greater than 0 but less than 1, while zero exponents means number is greater than or equal to 1 but less than 10.

A simple way to think of this is to count how many decimal places to add a zero (due to base 10). If exponent is positive, then we move decimal place to the right, if exponent is negative we move decimal place to the left.

Examples:

Decimal Notation Scientific Notation
100 \(1 x 10^2\)
1,000 \(1 x 10^3\)
9,600,000,000 \(9.6 x 10^9\)
0.2 \(2 x 10^{-1}\)
0.0000036 \(3.6 x 10^{-6}\)

Most calculators and statistical programs can calculate in either decimal or scientific notation, but they often output scientific notation when value is large or small. Their scientific notation is often in form of a decimal value followed by letter \(e/E\) then exponent and it’s sign, for example 3.6e-06 for 0.0000036.

A.7 Rational Exponents and Radicals

In this section we discuss:

  1. Fractional exponents
  2. nth Root of Real Numbers
  3. Rational Exponents and Radicals
  4. Properties of Radicals

A.7.1 Fractional exponents

When a number is raised to a fraction or when an exponent is a fraction, it is called “fractional exponent”. For example:

  • \(9^{1/2}\),
  • \(81^{1/3}\)
  • \(10000^{1/4}\).

Below we see how fractional exponents are actually nth roots of it’s base.

A.7.2 Nth Root of Real Numbers

While an exponent is number of times a value is multiplied by itself, \(nth\) root is number of times it takes to get to its original value. For example, for \(9^2 = 81\), 2 is an exponent while 81 is it’s output, to get original value 9 from this output, we take square root of 81. Similarly, in \(10^4 = 10000\), taking fourth root of \(10000\) gets us back to \(10\).

Nth is a generalization of root value (number of time we need to get original value) like 2nd and forth in our examples above. There are two frequently used roots, these are “square root” for 2nd root and “cube root” for third root.

Incidentally, word “root” is used to mean origin just like a tree which originates from it’s roots.

Positive numbers have two real nth roots if \(n\) is even like 2nd, fourth, and sixth. For example, 16 has -4 and 4 as it’s square roots and 10000 has -10 and 10 as it’s fourth root.

Negative numbers have no real nth root if n is even, this means it is an error to take nth root of a negative number like \(\sqrt{-2}\) or \(\sqrt[4]{-10000}\). However, there is a complex number system used specifically for taking nth root of negative values. We shall not discuss this system here as it is way beyond the scope of this introductory or refresher session. Reason why an error occurs is because no real number raised to an even exponent can be negative. For odd n, there is only one real nth root for example \(\sqrt[3]{-100} \approx -4.6\) or \(\sqrt[5]{-960} \approx -3.9\)

A.7.3 Rational Exponents and Radicals

There two ways to represent nth root, with a square root symbol \(\sqrt{}\) or as a fractional exponent.

Initial representation using square root symbol is called nth root radical. \(\sqrt{\quad{}}\) symbol is called a radical, \(n\) is referred to as an index of radical and \(a\) is a radicand:

\[\sqrt[n]{a}\]

Fractional exponent are also nth root because of first property of exponents, that is, \(b^mb^n = b^{m+n}\).

Logic here is that, if you raise a number by a fraction and multiply this number and fraction as many times as n or denominator, then we get it’s original value since multiplying values with simila base means addition of their exponents.

Take for example \(27^{1/3}\), if we multiply this value three time, first property of exponents tell us we need to add it’s exponential, that is \(\frac{1}{3} + \frac{1}{3} + \frac{1}{3} = 1\) and any number raised to 1 is that number.

Based on this fact, and that nth root gets us original value; this fractional exponent is thus an nth root. What we mean is that \(27^{1/3}\) is equivalent to \(\sqrt[3]{27}\) which output 3.

For any real non-negative number raised by a fractional natural numbers without similar prime factors and where n is even, rational exponents can be defined as:

\[a^{x/n} = \begin{cases} (a^{1/n})^x & = (\sqrt[n]{a})^x\\ (a^x)^{1/n} & = (\sqrt[n]{a^x}) \end{cases}\]

and

\[a^{-m/n} = \frac{1}{a^{m/n}} \quad{} a \neq 0\]

Examples:

  1. \(27^{2/3} \quad{} = (27^{1/3})^2 \quad{} = (\sqrt[3]{27})^2 \quad{} = 9\)
  2. \(27^{2/3} \quad{} = (27^2)^{1/3} \quad{} = \sqrt[3]{27^2} \quad{}= 9\)
  3. \(27^{-2/3} \quad{} = \frac{1}{27^{2/3}} \quad{} = \frac{1}{9}\)

Note:

  • When index is 2, it often omitted and therefore that expression would have a radical and a radicad, for example \(\sqrt{16}\)
  • If there are two nth root, then positive root will often be outputted, this is called principal nth root.

A.7.4 Properties of Radicals

Following on properties of exponents, properties of radicals aid in changing or simplifying radicals.

If \(n\) and \(m\) are natural numbers greater than or equal to 2, and if \(x\) and \(y\) are positive real numbers, then:

  1. \(\sqrt[n]{x^n} \quad{} = x\)
  2. \(\sqrt[n]{xy} \quad{} = \sqrt[n]{x} \sqrt[n]{y}\)
  3. \(\sqrt[n]{\frac{x}{y}} \quad{} = \frac{\sqrt[n]{x}}{\sqrt[n]{y}}\)

Examples of simplification using properties of radicals:

  1. Using initial property: \(\sqrt{(4x^{3}2y)^2} \quad{} = 4x^{3}2y\)
  2. Using second property: \(\sqrt[3]{64} \sqrt[3]{8} \quad{} = \sqrt[3]{64 * 8} \quad{} = \sqrt[3]{512} \quad{} = \sqrt[3]{8^3} \quad{} = 8\)
  3. Using third property: \(\sqrt{\frac{xy}{100}} \quad{} = \frac{\sqrt{xy}}{\sqrt{100}} \quad{} = \frac{\sqrt{xy}}{10}\)

A.8 Linear Equations and Inequalities in One Variable

Linear equations are written as:

\[y = mx+b\]

or as an inequality

\[y \geqslant mx+b\]

Both equations are first-degree polynomials. Inequality equations have symbols \(<\), \(>\), \(\leq\), \(\geq\) instead of \(=\).

A value substituted for a variable in an equation to make it true is called a solution. For instance, 2 is a solution in equation below:

\[10 = 3x + 4 \qquad{} \to x = 2\]

Set of all solutions is called a solution set, hence solving an equation/inequality means finding it’s solution set.

Two equations are said to be equivalent if they have similar solution set. Idea of equivalent equations is used to transform equations such that their solutions are simpler to get. For example, these two equations are similar

\[4x + 6 = 6x + 2\]

That is because they have a similar solution 14 which we obtain when we solve for \(x\).

\[\text{Putting like terms together: }\qquad{} 4x - 6x = 2 - 6\]

\[ -2x = -4 \qquad{} \therefore x = 2\]

We can confirm 2 is a solution for both equations by substituting it for \(x\).

\[4(2) + 6 = 6(2) + 2 \qquad{}\text{which evaluates to } \quad{}14=14\]

To solve linear equations, the following equality properties are used:

  1. Addition and Subtraction property: Same quantity is added or subtracted from each side of a given equation
  2. Multiplication and division properties: Same non-zero quantity is multiplied or divided on each side of a given equation

Examples of solving linear equations:

\[5x - 2(3x + 4) = 2x - 2(3x - 3.5)\]

Opening brackets we get:

\[5x - 6x - 8 = 2x - 6x + 7\]

Putting like terms together we get:

\[-x - 8 = -4x + 7\]

Using addition and subtraction properties we get:

\[4x - x = 8 + 7\]

Using division property we get:

\[x = \frac{15}{3} \qquad{} \therefore x = 5\]

We can confirm this is a solution for these equations by substituting 5 for \(x\)

\[5(5) - 2(3*5 + 4) = 2*5 - 2(3*5 - 3.5) \qquad{} \text{which evaluates to }\quad{} -13 = -13\]

A.9 Quadratic Equation

When there is one variable, quadratic equation can be expressed as:

\[ax^2 + bx + c = 0 \qquad{} a \neq 0\]

Where:

  • \(x\) is a variable and
  • \(a\), \(b\), and \(c\) are constants

This equation is referred to as a standard form (note terms are in decreasing order).

Core methods of solving quadratic equations are:

A.9.1 Solving quadratic equations by square root

This method is used when a quadratic equation has no 1st degree, that is, \(bx\) is not there. This equation is expressed as:

\[ax^2 + c = 0 \qquad{} a \neq 0\]

Solving for \(x\) is done by square rooting both sides, that is:

\[\text{if} \quad{} a^2 = b, \quad{} \text{then} \quad{} a = \pm \sqrt{b}\]

For example, for this equations:

\[3x^2 - 81 = 0\]

we solve for \(x\) by ading both sides with 81 and dividing 3 before square rooting both sides.

\[\sqrt{x^2} = \pm \sqrt{\frac{81}{3}}\]

This leads us to:

\[x = \pm \sqrt{27}\]

To confirm this, we can substitute value obtained for \(x\),

\(3\sqrt{27^2} - 81=\) 0.

A.9.2 Solving quadratic equations by factoring

To factor a number means to get pairs of numbers whose product outputs original number. These pairs are referred to as factors, and process of determing factors is referred to as factoring.

As an example, number 10 has two pairs (four in total) whose product is 10, these pairs are 2 and 5 (25 = 10) and 1 and 10 (1 10 = 10).

Here is a table with all factors for numbers 1 through 10.

Number Factors
1 1
2 1, 2
3 1, 3
4 1, 2, 4
5 1, 5
6 1, 2, 3, 6
7 1, 7
8 1, 2, 4, 8
9 1, 3, 9
10 1, 2, 5, 10

For quadratic equations of the form

\[x^2 + bx + c = 0\]

Where \(a = 1\)

solving for \(x\) involves finding factors of constant term (\(c\)) which add up to the middle term (\(b\)). These factors are used to form two equations whose product outputs original quadratic equation. These equations are in form of

\[(x + m)(x + n)\]

where \(m\) and \(n\) are our two factors.

Let us look at an example of solving for \(x\) using factoring methods.

Example 1

Given

\[x^2 + 6x + 9 = 0\]

we can begin solving for \(x\) by factoring it as

\[(x + 3)(x + 3) = 0\]

Note, product of these equations should output original equation.

With that we can now solve for \(x\) by dividing both sides by \((x + 3)\) and then aadding -3:

\[\frac{(x + 3)(x + 3)}{(x + 3)} = \frac{0}{(x + 3)}\]

\[x + 3 = 0\] \[x + 3 - 3 = 0 - 3\] \[\therefore x = -3\]

Implication of sign of \(b\) and constant term \(c\)

Sign of \(b\) and \(c\) have an implication on factors to be used.

  • If \(c\) is positive both factor can be negative or positive; specifically if \(b\) is:
    • positive, then factors should be positive
    • negative, then factors should be negative

Overall these factors must add up to \(b\)

  • If \(c\) is negative then factors have alternating signs, that is, if \(b\) is:
    • positive, then larger factor is positive
    • negative, then larger factor is negative

Overall, these factors should be \(b\) units apart.

Example 2

Given

\[x^2 + 2x - 8 = 0\]

then our two equations are

\[(x + 4)(x - 2)\]

Since \(c\) is negative and \(b\) is positive.

Solving for \(x\) we get a solution set with $x =-4 $ or \(x = 2\).

When quadratic equation has a leading coefficient \(a\), and all coefficients (\(a\), \(b\) and \(c\)) have a shared factor, then factoring will begin by reducing these coefficients with their greatest shared factor.

Example 3

Given

\[4x^2 - 24x + 36 = 0\]

We can solve for \(x\) by factoring 4 out as it is our greatest shared factor:

\[4(x^2 - 6x + 9) = 0\]

Factoring brackets:

\[4(x - 3)(x - 3) = 0\]

Multiplying both sides by \(4(x-3)\)

\[x - 3 = \frac{0}{4(x-3)}\]

This thus lead to \(x = 3\).

A.9.3 Solving using a quadratic formula

When a quadratic equation cannot be solved by either square root or factoring methods, then a quadratic formula is used. To reason out this formula we need to begin by grasping an important concept known as completing squares.

A.9.3.1 Completing square

This converts a quadratic equation into a perfect square such that a standard form or general quadratic equation like

\[ax^2 + bx + c = 0\]

becomes

\[(x + A)^2 = B\]

Where \(A\) and \(B\) are constants.

Core concept of completing squares is to make left side of a quadratic equation a perfect square which enables us to use square root method to solve for \(x\).

Left side of a quadratic equation is transformed to a square by moving constant term across equals sign and then getting a third term which makes left side a square of a binomial (polynomial with two terms).

For us to make left side a square of a binomial, we need to take note and check for possible pattern from an existing and well known square of a binomial, that is:

\[(a + b)^2\]

It’s exapanded form is a trimodal (polynomial with three terms)

\[(a+b)^2 = (a + b)(a + b) = a^2 + 2ba + b^2\]

Since our left side would be remaining with two terms given that we would have taken our constant across our equals sign, then we would need a third term. From our trimodal above we can see third term is a square of half coefficient of \(a\), that is, \((\frac{1}{2}.2b)^2 = b^2\).

As an example, given

\[x^2 - 4x -2 = 0\]

We can make our left side a square of a binomial by taking 2 across equals sign and adding left and right sides of our equation with square of half coefficinet of second term \(-4x\).

\[x^2 - 4x = 2\]

Square of half of 4 is four, therefore

\[x^2 - 4x + 4 = 2 + 4\]

Since \(x^ - 4x + 4\) is \((x-2)^2\), then

\[(x-2)^2 = 6\]

\(x\) is therefore \(\sqrt{8}\)

When a quadratic expression cannot be expressed as a perfect square (coefficient of second term in trinomial is not twice the second term in binomial), then it can be expresses as a sum of a square and a constant.

For example, quadratic equation:

\[x^2 - 12x + 40\]

can be expressed with \((x - 6)^2\) but only if we added a constant 4. Adding this constant to make a perfect square is why this method is referred to as completing the square.

A.9.3.2 Formulating the Quadratic Formula

From foregoing discussion, we know that our third term in an expanded equation of as square of a binomial is a square of half it’s second term’s coefficient. Quadratic equation is formulated using this knowldge to solve for x.

Therefore, from our standard quadratic equation

\[ax^2 + bx + c = 0 \qquad{} a \neq 0\]

We begin by eliminating coefficient of \(x^2\), that is \(a\).

\[\frac{a}{a}x^2 + \frac{b}{a}x + \frac{c}{a} = \frac{0}{a}\]

We then move our constant \(\frac{c}{a}\) to our right side.

\[x^2 + \frac{b}{a}x = -\frac{c}{a}\]

Now we make our left side a perfect square by adding square of half coefficient of \(x\) in our second term, which is

\[(\frac{1}{2} * \frac{b}{a})^2 = \frac{b^2}{4a^2}\]

\[x^2 + \frac{b}{a}x + \frac{b^2}{4a^2} = \frac{b^2}{4a^2} - \frac{c}{a}\]

We can now convert out left side to a perfect square.

\[x^2 + \frac{b}{a}x + \frac{b^2}{4a^2} = (x + \frac{b}{2a})^2\]

We also can simplify our right hand side.

\[ \frac{b^2}{4a^2} - \frac{c}{a}= \frac{b^2 - 4ac}{4a^2}\]

Our equation now looks like this;

\[(x + \frac{b}{2a})^2 = \frac{b^2 - 4ac}{4a^2}\]

Square rooting both sides we get:

\[x + \frac{b}{2a} = \pm \sqrt{\frac{b^2 - 4ac}{4a^2}}\]

We can simplify our right hand side as:

\[\frac{\pm \sqrt{b^2 - 4ac}}{\pm \sqrt{4a^2}} = \frac{\pm \sqrt{b^2 - 4ac}}{2a}\]

Finally we can solve for \(x\) by subtracting both sides with \(\frac{b}{2a}\).

\[x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\]

This is what is referred to as a quadratic formula. It can be used to solve for \(x\) in any quadratic equation when we cannot use square root or factoring methods.

It is useful to note, \(b^2 - 4ac\) under our radical (\(\sqrt{\quad{}}\)) is called a discriminant. It gives information about expected solution, that is,

  • a positive discriminant has two real solutions,
  • a zero discriminant has one real solution and
  • a negative discriminant has no real solution*

* In the latter case, no real number can be obtained because we cannot take square root or (any even root like 4th or 6th) of a negative number.

A.10 Other Pre-Calculus Topics

In this sub-section we want to revisit these four mathematical concepts:

A.10.1 Sequences, Series and Summation

A.10.1.1 Sequences

Sequences are often referred to as a successive list of numbers numbers, however, a sequences is a function which outputs this successive list of numbers or elements.

For example a sequence of even numbers can be given by

\[a(n) = 2n + 2\]

where \(a(n)\) is used instead of \(j(x)\)

Output of this sequence are called terms of a sequence, for example \(a(1) = 4\) is our first term, \(a(2) = 6\) is our second term, and \(a(3) = 8\) is our third term. For this sequence, ordered list of elements is 4, 6, 8, and so on. This ordered list of element is what is wrongly referred to as a sequence.

Note, for our sequence \(a(n) = 2n + 2\), we can also refer to it in an abbreviated for such as {2n + 2}. Also note, since this sequence has no upper limit (it is not finite), then we refer to it as an infinite sequence. However, if a sequence has an upper limit, then we would refer to it as a finite sequence, like if we defined limit of \(a(n)\) as 100 (\(2 <= n <= 100\)).

A.10.1.2 Series and Summation Notation

A series is an operation consisting of addition of ordered finite or infinite sequence of terms like \(a_1 + a_2 + a_3 + a_4\) for a finite series and \(a_1 + a_2 + a_3 + ...\) for an infinite series. Here series is an operation of adding \(a_i\), one after the other.

Series are often conviniently written using a summation notation (\(\sum\)), with summation index written below and above summation notation like

\[\sum_{i = 1}^{10}\]

where \(i\) indicates summing index which begins from 1 and end at 10. This summing index need not be represented by letter \(i\), it can also be other letter’s like \(k\) or \(j\).

Here is an example of a finite series using letter \(a\) as a summing index:

\[\sum_{a=2}^5 a^2 = 2^2 + 3^2 + 4^2 + 5^2 + 6^2\]

\[ = 4 + 9 + 16 + 25 + 36\]

Summation notation for alternating series

When we have an alternating sequence (negative and positive), then we need to show an alternating series.

For example, for this sequence

\[a_k = (-1)^{k-1} * (a_{k-1})^2 \qquad{} with \quad{} a_1 = 2\]

we can write this series

\[a_k = \sum_{k=2}^{4} (-1)^{k-1}(a_{k-1})^2 = 2, -4, 16, -256\]

A good example where summation of a series is used is in denoting computation of arithmetric mean, that is:

\[\bar{x} = \frac{1}{n} \sum_{k=1}^{n} x_k\]

Where:

  • \(x\) is range of values
  • \(n\) is number of terms

A.10.2 Arithmetric and Geometric Sequences

A.10.2.1 Arithmetric sequences

Arithmetic sequences have a constant difference “d” between each term. That means, for any term

\[a_1, a_2, a_3, ..., a_n, ...\]

there is a constant difference such that,

\[a_n - a_{n-1} = d\]

or

\[a_n = a_{n-1} + d \qquad{} \text{for every } n \geqslant 1\]

Good example of arithmetic sequences are odd and even numbers, they both have a constant difference of two. Therefore,

odd series is given by:

\[\sum_{k=0}^n 2k + 1\]

and even series is given by:

\[\sum_{k=0}^n k + 2\]

A.10.2.2 Geometric sequences

Geometric sequences have a constant ratio “r” between terms. This means, for any term:

\[a_1, a_2, a_3, ..., a_n, ...\]

there is a constant ratio “r” such that:

\[\frac{a_n}{a_{n-1}} = r\]

Example of terms in a geometric sequence are 2, 4, 8, 16, 32 which has a constant ration of 2, that is 4/2, 8/4, 16/8 and 32/16 equals to 2.

A.10.2.3 Nth-Term Formulars

Given that {\(a_k\)} is an arithmetic sequence with a common difference \(d\) between it’s terms, then we can build a pattern for a an nth-term. That is, we know:

\[a_2 = a_1 + d\]

\[a_3 = a_2 + d = a_2 + 2d\]

\[a_4 = a_3 + d = a_2 + 3d\]

then,

\[a_n = a_1 + (n-1)d \qquad{} for \space all \space n > 1\]

Similarly, we can formulate an nth-term for a geometric sequence by taking note of it’s sequence, that is:

\[a_2 = a_1r\]

\[a_3 = a_2r = a_1r^2\]

\[a_4 = a_3r = a_2r^3\]

\[\therefore a_n = a_1r^{n-1} \qquad{} for\space all\space n > 1\]

A.10.3 Factorials and Binomial Theorem

A.10.3.1 Factorial

This is a product of a positive integer and all positive integers below it, it is denoted by \(n!\), a notation introduced by “Christian Kramp”. More compactly, factorial function is defined as a product, that is:

\[n! = \prod_{k=1}^nk\]

Where:

  • \(n\) is the initial integer
  • \(k\) sequence terms starting from 1 up to \(n\)
  • \(\prod\) means product

This means:

\[1 * 2 * ... * (n-2) * (n-1) * (n)\]

But it is easier to think computationally as:

\[n * (n-1) * (n-2) * ... * 2 * 1\]

For example, \(6!\) means a product of 6 and all values below it, that is:

\[6! = \prod_{k=1}^6 = 6 * 5 * 4 * 3 * 2 * 1\]

this should output 720.

\(1!\) and \(0!\) equals to 1, latter case is due to convention that product of no numbers at all is 1.

Factorials are applicable in many mathematical and statistical concepts. For example in algebra we use it as coefficient of the binomial formula, in combinatorics (which we will cover in our probability chapter), it used to determine different ways of arranging n-things and to facilitate expression manipulation.

It might be of interest to note that:

\[n! = n * (n-1)! \qquad{} n \]

This can make some computations efficient like:

\[\frac{4!}{3!} = \frac{4 * 3!}{3!} = 4\]

A.10.3.2 Binomial Coefficient

One important formula we will use in our combinatorics section under probability chapter is the binomial coefficient. Binomial coefficient is used to count number of ways \(n\) objects can be arranged in \(k\) ways, for example how many ways can a class of 10 students select two representatives; order here does not matter (choosing Hellen and Janice or Janice and Hellen is similar). So in a way we are looking at how many ways elements can be grouped without considering it’s order. This is often described as n choose k, though some use different notations like \(r\) instead of \(k\).

There are a number of ways to denote binomial coefficient, these include:

\[C_{(n,k)} = C_{n,k} = {_n}C_k = {_n}C^k = C_k^n = \binom{n}{k}\]

Out of this, standard notation is \(\binom{n}{k}\) although \(C_n^k\) is often used for it’s typing/writing convenience.

To understand this formula for binomial coefficient, let us revisit class of 10 students example from which we want to select two “reps”. Expectation is that these two posts will have different people such that no student can occupy both posts.

Question now is, how many ways can we select two different students or a unique pair of students for these posts without considering their order.

Let’s reason it out, we have 10 students and two posts, for the first post all 10 students have an equal chance of filling it. Now suppose one student is selected, s/he is removed from possible selection in next post hence post two now has 9 possible candidates. So we can say there are 90 (10*9) possible way of filling these two posts. But let’s take note that these possibilities are counting same pair with different order to be two different pairs hence it would have “Hellen - Janice” and “Janice - Hellen” to be two different pairs yet they are similar. Therefore now we need to figure out how we can count only unique pairs.

Consider a hat with 3 letters; \(a,b, \text{ and } c\) and we want to select two letter. If we use earlier reasoning, we get that we have 3 * 2 or 6 possible ways we can select these letter. These are:

matrix(c("a","a","b","b","c","c","b","c","a","c","a","b"), ncol = 6, byrow = TRUE)
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "a"  "a"  "b"  "b"  "c"  "c" 
## [2,] "b"  "c"  "a"  "c"  "a"  "b"

Notice pairs in column 1 and 3, 1 and 5, 4 and 6 are similar, they just have a different order. To get unique pairs we need to find out how many ways we can arrange two letters differently and divide with our earlier output. For example, given letters \(a\) and say \(b\), how many ways can we arrange them? Here we have two objects and two spots, first spot has 2 possibilities and second has 1 hence we have 2*1 or 2 possibilities (\(a\) & \(b\) or \(b\) & \(c\)). To grasp this, suppose we had 4 letters and 4 spots, how many ways can we arrange them? It would be \(4 * 3 * 2 * 1\) or \(4!\).

Going back to our three letters, we now know there are 6 unique pairs, and there are 2 ways of arranging two objects, hence we can get all possible pairs without considering their order by dividing 6 by 2. Therefore there are 3 unique pairs or there are three ways of selecting two letters from a total of 3 letters.

combn(letters[1:3], 2)
##      [,1] [,2] [,3]
## [1,] "a"  "a"  "b" 
## [2,] "b"  "c"  "c"

To our example on students, we know we have 90 possibilities, to get unique pairs we need to multiply by 2 * 1 or simply 2!. Therefore we get there are 45 possible ways of filling two post out of 10 objects.

We can now generalize this reasoning to a formula. Given this would be quotient of total events by number of ways these events can be arranged, , then our numerator will be:

\[n * (n-1) * (n-2) * ... * (n - (k - 1))\]

where \(n\) is total number of objects and \(k\) is number of objects to be selected

and our denominator will simply be \(k!\), hence our formula will be:

\[\frac{n*(n-1)*(n-2)*...*(n-(k-1))}{k!}\]

We can improve on our numerator so it does not look so long or uncomprehesible. Recall \(4!\) can be written as \(4 * 3!\), then let’s use this concept to turn our numerator to a factorial. For our class example we had $10 * 9 $ as our numerator, if we made it a factorial \(10!\) then we would have \(10*9*8*7*6*5*4*3*2*1\) or simply \(10*9*8!\). We need to eliminate this \(7!\) and mathematically we can do this by dividing with the same number, that is \(7!\). Since \(7!\) is really \((n - k)!\), then we can revise our formula to something simpler as:

\[\frac{n!}{k!(n-k)!}\]

This is what is referred to as a binomial coefficient and we will use it to formulate binomial theorem.

A.10.3.3 Binomial Theorem

In our previous section on polynomials, we saw how to expand a binomial \((a + b)^2\) to \(a^2 + 2ab + b^2\). Here we look at how to expand a binomial for any exponent.

To do this we begin by taking note of pattern formed with exponents 1 through 5.

\[(a+b)^1 = a + b\]

\[(a+b)^2 = a^2 + 2ab + b^2\]

\[(a+b)^3 = a^3 + 3a^2b + 3ab^2 + b^3\]

\[(a+b)^4 = a^4 + 4a^3b + 6a^2b^2 + 4ab^3 + b^4\]

\[(a+b)^5 = a^5 + 5a^4b + 10a^3b^2 + 10a^2b^3 + 5ab^4 + b^5\]

From above outputs, we can observe this pattern:

  1. Number of terms in each expression is \(n+1\) like initial has 2 terms, second has three terms, third has four terms and so on.
  2. Exponent of initial term increases from left to right while that of the second term increases from right to left. For example, in our third expression (\(a^3 + 3a^2b + 3ab^2 + b^3\)) we see that our first term (\(a^3\)) has \(a\) raised to three while \(b\) is raised 0, second term (\(3a^2b\)) has \(a\) raised to 2 while \(b\) is raised to 1, third term (\(3ab^2\)) has \(a\) raised to 1 while \(b\) is raised to 2, final term (\(b^3\)) has \(a\) raised to 0 while \(b\) is raised to 3. Basically \(a\) is raised 3, 2, 1, 0 while \(b\) is raised by 0, 1, 2, 3; a factorial sequential terms.
  3. In each term, exponents sum up to \(n\). As an example, looking at \(a^3 + 3a^2b + 3ab^2 + b^3\), we see that initial term has exponents 3 and 0 totaling to 3, second term has 2 and 1 totaling to 3, third term has 1 and 2 totaling to 3, and fourth term has 0 and 3 totaling to 3.
  4. First and last terms have a coefficient of 1, second coefficient will have coefficient of \(n\), for subsequent terms we multiply coefficient of preceding term with exponent of \(a\) and then divide by index/position of that (previous) term. For example, to get coefficient of the third term in our fifth expression, we multiply previous term’s coefficient 5 with exponent of \(a\) 4 to get 20 and divide it by 2 which is the index (position) of that term hence we get 10. To get coefficient of the fourth term we multiply 10 by 3 to get 30 and divide by 3 to get 10.

Based on this observed pattern, let’s try and expand \((a+b)^6\)

First we are expecting our expression to have 7 terms, that is \(6 + 1\)

\[\frac{}{1} \frac{}{2} \frac{}{3} \frac{}{4} \frac{}{5} \frac{}{6} \frac{}{7}\]

Second, we know exponents of \(a\) are decreasing from 6 to 0 while that of \(b\) are increasing from 0 to 6. We also know for each term these coefficients sum to \(n\). For now we will use \(\mathbb{N}\) as a placeholder for our coefficients.

\[u^6 + \mathbb{N}u^5v + \mathbb{N}u^4v^2 + \mathbb{N}u^3v^3 + \mathbb{N}u^2v^4 + \mathbb{N}uv^5 + v^6\]

For coefficients, we know initial coefficient is 1 and second coefficient is 6 (same as \(n\)). Subsequent coefficient will be a product of previous coefficient and exponent of \(a\) divided index of previous term, therefore we should get, \((6*5)/2=\) 15, \((15*4)/3=\) 20, \((20*3)/4=\) 15, \((15*2)/5=\) 6 and \((6*1)/6=\) 1.

\[\therefore (u+v)^6 = u^6 + 6u^5v + 15u^4v^2 + 20u^3v^3 + 15u^2v^4 + 6uv^5 + v^6\]

Now we can formulate a formula given our observed pattern.

This time let us start with our coefficients, we want to generalize a simple computation expression. For reference and reasoning purposes, let’s use coefficients of \((a+b)^6\) expanded, that is \(6, 15, 20, 15 and 6\) with \(n\) equal to 6. We obtained these coefficients by multiplying \(n\) by exponent of \(a\) and then dividing by index/position of the term. That is:

For the second term we had a coefficient of 6 which is a product of first term’s coefficient 1 and \(a\)’s exponent 6. We divided this by index/position of previous term 1 thus giving us 6. 6 is actually our \(n\) hence we can generalize second term as

\[\frac{n}{1}\]

Our third term had a coefficient of 15 which was obtained by multiplying 6 our \(n\) by 5 which is \(n-1\) divided by 2 which is position of previous term. For now let us take third term as follows but we will amend it slightly later.

\[\frac{n(n-1)}{2}\]

For our fourth term, we had a coefficient of 20 which we obtained by multiplying previous coefficient 15 by 4 (exponent of “a”) and dividing by 3 (position of previous term). Basing our computation on \(n\) and thereby having computation linkied to start we get.

\[\frac{\frac{6 *5}{2} * 4}{3} = 20\]

notice if we took 2 below we essentially get

\[\frac{6 * 5 * 4}{1 * 2 * 3}\]

Symbolically we can represent this as

\[\frac{n(n-1)(n-2)}{1 * 2 * 3}\]

Given these, we should be able to see an interesting pattern. Numerator is a product of \(n\) and terms of a sequence of integers decreasing from \(n\) which correspond to number of terms. For example second term’s numerator is \(6(6-1)\) which is \(6 * 5\), third term’s numerator is \(6(n-1)(n-2)\) which is \(6 * 5 * 4\) and based on this pattern, fourth term’s numerator will be \(6 * 5 * 4 * 3\). With this sequence, that is \(n, (n-1), (n-2), ..., (n-k)\), where \(k\) is index /position of previous term, we can simplify it by converting it to a factorial. We do this by dividing \(n!\) by \((n-k)!\) (recall how we formulated factoria ). That will be our numerator.

Denominator is product of terms of a sequence, that is for the second term it would be 1, for third term would be product of index of first term 1 and index of second term 2, third would product of index of first, second and third. Basically factorial of index of previous term (\(k\)), hence second term would be \(1!\), third term would be \(2!\), fourth term would be \(3!\) and so on.

What we have now is literally a binomial coefficient, that is:

\[\frac{n!}{k!(n-k)!}\]

This means coefficient of any term in an expanded binomial can be obtained by this binomial coefficient.

For our variables, we know first variable like \(a\) would have it’s coefficients decreasing from n while that of \(b\) will be increasing from 0. To get exponent of any “a” we minus index of previous terms \(k\) from \(n\), that is \(a^{n-k}\). For \(b\) exponent will simply be index of previous term \(k\).

We can now combine all we have noted to form this expression:

\[(a+b)^n = C_{n,0}a^n + C_{n,1}a^{n-1}b + C_{n,2}a^{n-2}b^2 + ... + C_{n,n}b^n\]

We can reduce it by using summation notations

\[(a + b)^n = \sum_{k=1}^n \binom{n}{k}a^{n-k}{k}\]

This is what is refered to as a Binomial Theorem and we can use it to expand a complete binomial or find a specific term in an expanded binomial expression. For example, given \((y - 1)^20\) we can get it’s tenth term as

For this example, \(n\) is 20 and \(k\) is equals to 9 (index of previous term).

\[\binom{20}{9}y^{20-9}(-1)^9 = \frac{11!}{9!(11)!}y^{11}-1\]

where \(n = 20\) and \(k = 9\) (index of previous term)

This should output \(-167960y^{11}\).

A.11 Elementary Functions

In this section we refresh on one of mathematics core concept, that is “functions”. As a build up to this concept we look at a Cartesian coordinate system and point-by-point graphing.

A.11.1 Cartesian Coordinate System

A Cartesian (rectangular) coordinate system is made up of two real number lines; horizontal and vertical. Together these number lines are called coordinate axes and individually as horizontal and vertical axis. They cross each other through their origin which is 0. Horizontal axis is referred to as x axis while vertical axis is referred to as y axis. Coordinate axes divide plane into four parts called quandrants.

Points on a plane like “P” are plotted at a point where their vertical and horizontal lines intersect. For point “P” the vertical and horizontal lines intersect at point 5, 5. These two numbers written as ordered pair are coordinates of point “P”. For any point, it’s initial coordinate is referred to as abscissa and it’s second coordinate is referred to as an ordinate. Sometimes coordinates of a point are referred to in terms of axis labels, that is x coordinate and y coodinate. Origin is a point with coordinates (0, 0).

An important point to note is that there is a one-to-one correspondence between points in a plane and elements in a set of all ordered pairs of real numbers. This note is what is known as fundamental theorem of analytic geometry.

A.11.2 Point-by-Point Graphing

Point-by-point plotting is a process of sketching a graph of an equation. To sketch a graph of an equation we need to plot enough points on a coordinate system like a Cartesian plane and connect these points with a smooth curve until graph’s apparent shape is visible. Points are ordered pairs or solution set.

For example, for this equation

\[x = 2y + 3\]

Given integers 0 through 23 as \(y\), we can compute \(x\) as

$x = 2*(0:10) + 3 = $ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23

and plot \(x\) against \(y\):

y <- 0:23
x <- 2*y +3
plot(y, x, type = "l", col = 4, panel.first = grid(20))
title("Plot of x = 2y + 3")

A.11.3 Introduction to Functions

A.11.3.1 Definition

In mathematics, a function is a rule, process or a method which links one set of elements to another set of element. This link is what is called correspondence.

In mathematical functions, for each element in the first set, there exists one and only one correspondence (link/relationship) in second set. Elements in initial set are referred to as domain and range for second group of terms.

Using these two terms (domain and range), we can define a function as one to which each element in our domain there corresponds one and only one element in our range.

As an example, each cat has a heart rate measured by beats per minute (bpm). For this example domain would be a specific cat and it’s range would be that cat’s beats per minute (bpm). This example can be expressed as a mathematical function. However, if we were to look at cats (domain) and it’s owners (range), then this might not be defined as a mathematical function. That is because there is a possibility of a cat having more than one owner.

Another example, if domain are natural numbers \(\mathbb{N}\) and their range is their square, then this can be considered a mathematical function, however, if range of these natural numbers \(\mathbb{N}\) were their square root instead of squares, then they would not be considered mathematical functions. This is because squares output only one value while square roots output two (negative and positive) values.

A.11.3.2 Functions and Equations

Equations are statements of equality containing one or more variables. Equality regards contents on left and right hand side of an equals sign \(=\). We have been using these equations like:

\[y = x^3 - 2(x) \qquad{} x > 1\]

In this equation, \(x\) is our input and \(y\) is our output, therefore, given a certain value of \(x > 1\), we expect to get an output \(y\). We threfore note \(y\) is dependent on \(x\) and \(x\) is independent since any value can be used as long as it is greater than 1. This is why \(x\) is called an independent variable and \(y\) is called a dependent variable.

Question now is, when do equations specify a function. Recall definition of a function, each domain can only have one and only one range. If we take domain to be inputs of an equation and range to be their outputs, then an equation can be defined as a function when an input has one and only one output.

Our earlier example of \(y = x^3 - 2(x) \qquad{} x > 1\) outputs only one value hence it specify’s a functions. But \(y = \sqrt{x}\) is not a function as square roots outputs two values.

Graphically, if a vertical line intersects a graph at exactly one point, then to each \(x\) value there corresponds exactly one \(y\) value which means that equation is a function.

Example, plot for equation \(y = x^3 +2*x\)

If there are more than points on a graph where it intersects vertical line, then there are \(x\) values with more than one \(y\) value and therefore equation is not a function.

y <- sqrt(x)
y2 <- -sqrt(x)

plot(x, y, ylim = range(c(y, y2))+0.2, type = "l", col = 4, panel.first = grid(15))
title(expression(paste("y = ", sqrt(x))))
abline(v = 0, col = "grey60")
abline(h = 0, col = "grey60")
lines(x, y2, col = 4)

A.11.3.3 Function Notation

Functions are named with alphabetical letters like \(f\) and \(g\). For example:

\[f: y = x^3 + 2x\]

\[g: y = \sqrt{x}\]

To specify domain value \(x\) for function \(f\) we often use symbol \(f(x)\) read as “\(f\) of \(x\)” or “\(f\) at \(x\)” thus replacing output or range \(y\) (\(y = f(x)\)). This is what we refer to as function notation.

Writting our two functions in function notation we get

\[f(x) = 2x + 1\]

\[g(x) = \sqrt{x}\]

Now we can get a specific range or output of \(f\) or \(g\), for example, for \(x = 4\) \(f(4) = 9\) and \(g(2) = 2\). We need to take note that, if \(x\) is not in domain of a \(f(x)\), then \(f(x)\) would not exist, we commonly refer to this as \(f\) is not defined at x.

\(f(x)\) can also be used to evaluate functions with expression which involve one and more than one variable.

For example, given this function with two expression on it’s numerator

\[\frac{f(x + y) - f(x)}{y}\]

And given this

\[f(x) = x^3 + 3x + 2\]

we can evaluate our function at \(x = a\)

\[f(a) = \frac{((a^3 + y) + 3(a + y) + 2) - (a^3 + 3a + 2)}{y}\]

A.12 Graphs and Transformations

In this section we shall go through some elementary functions often encountered in mathematics, then go over vertical and horizontal shifts, as well as reflections, expansions and contractions and conclude with piece-wise-defined functions.

A.12.1 Some core functions

There are six basic functions often encountered in mathematics, these are Identity function, Absolute value function, Square functions, Cube function, Square root function, and cube root function.

Below are graphs of these functions, it is important to know their definition, domain, range and shape.

We begin with identity function, whose range is equal to it’s domain; \(x\) values are similar to \(y\) values and they real numbers \(\mathbb{N}\).

Second plot is for an absolute function. An absolute value is a value’s distance from zero and denoted as \(|x|\) for any value \(x\). It can be defined as:

\[f(x) = |x| = \begin{cases} x &\text{if } x \geq 0 \\ 0 & \text{if } x = 0 \\ -x & \text{if } x < 0 \end{cases}\]

An absolute function is like \(g(x) = |x|\) where domain \(x\) are real numbers \(\mathbb{R}\) and range is an interval from 0 to infinity \([0, \infty)\)

Our third plot is for a square function, that is \(h(x) = x^2\). Domain for this function are real numbers \(\mathbb{R}\) and range is an interval from 0 to infinity \([0, \infty)\)

Fourth plot is for a cube function \(m(x) = x^3\). Both it’s domain and range are real numbers \(\mathbb{E}\)

m <- x^3

plot(x, y, type = "n", panel.first = grid(10), ann = FALSE)
abline(v = 0, col = "grey60")
abline(h = 0, col = "grey60")
lines(spline(x, m), col = 4)
title(expression(paste("m(x) =", x^3)))
title(xlab = "x", ylab = "m", line = 2)

Fifth plot is on square root function \(n(x) = \sqrt{x}\) with domain and range being an interval between 0 and infinity (\([0, \infty\)).

n <- sqrt(x[x >= 0])

plot(x, y, type = "n", panel.first = grid(10), ann = FALSE)
abline(v = 0, col = "grey60")
abline(h = 0, col = "grey60")
lines(spline(x[x >= 0], n), col = 4)
title(expression(paste("n(x) = ", sqrt(x))))
title(xlab = "x", ylab = "n", line = 2)

Final plot is for a cube root function \(p(x) = \sqrt[3]{x}\) with domain and range being set of all real numbers \(\mathbb{R}\)

negroots <- -abs(x[x < 0])^(1/3)
p <- c(negroots, x[x >= 0]^(1/3))

plot(c(-3, 3), c(-3, 3), type = "n", panel.first = grid(10), ann = FALSE)
abline(v = 0, col = "grey60")
abline(h = 0, col = "grey60")
lines(spline(x, p), col = 4)
title(expression(paste("n(x) = ", sqrt(x, 3))))
title(xlab = "x", ylab = "n", line = 2)


par(mar = op)

A.12.2 Vertical and Horizontal Shifts

Transformation occurs when a new function is formed by performing an operation on another function. Transformation involves:

  • moving a graph from one position to another yet maintaining it’s shape
  • re-sizing the graph

A.12.2.1 Transformation by maintaining shape of graph

When transformation does not involve changing shape of a graph, transformation is done by either rotating (turning), reflecting (flipping) or translating (moving/sliding) a graph.

A.12.2.1.1 Rotating

Rotation of a graph means turning a graph around a center. Note, distance from this center to any point on the shape remains the same.

A.12.2.1.2 Reflecting

This means “flipping” a graph across a line called a “mirror/central line”. When the graph is reflected, reflected image has the same shape as original and every point has similar distance from the central line or mirror line. Mirror line can be in any direction, that is diagonal,horizontal, or vertical.

Here are two examples of triangles reflected across a diagonal and horizontal central line.

A.12.2.1.3 Translation

Translation simply means to move. In this case it’s a graph being moved and this movement does not involve rotation, re-sizing or anything else which would change shape of the graph. Translation of graphs is also referred to as shifts.

Translation can either be horizontal or vertical, and referred to as Horizontal and Vertical translation.

Here’s is a vertical translation of function \(f(x) = x^3\).

Here’s a horizontal translation for a cube function \(f(x) = x^3\).

A.12.2.2 Transformation by Resizing

Re-sizing involves dilation, contraction, compression, and enlargement/expansion while maintaining shape of a graph.

In order to maintain shape of a graph, re-sizing begins by drawing a line from a central point to each point on that graph. A resized graph is them produced by increasing or decreasing a simiar distance from each point along this lines.

Here’s an example of re-sizing, it can be an expansion or contraction depending on original shape. If original shape was inner box, then re-sizing was an expansion, if original shape was outer shape, then re-sizing was done by contraction.

A.12.3 Piecewise-defined functions

Piece-wise-defined functions are functions which behave differently depending on input (x) value. Two examples of these kind of functions are:

  • Different average score obtained by a student depending on number of hours spent revising and practicing
  • Absolute function

Suppose average score obtained by a student who has been studying and practicing for:

  • zero and just about five hours is 40%,
  • five and just about ten hours is 60%,
  • ten hour upto twelve point five hours is 80%
  • twelve point five hours and above is 100% (maximum score)

We could represent this in a piece-wise function defined as follows:

\[S(t) = \begin{cases} 8t & \text{if } 0 \leqslant t < 5\\ 6t & \text{if } 5 \leqslant t < 10\\ 8t & \text{if } 10 \leqslant t \leqslant 12.5\\ 100 & \text{if } t > 12.5 \end{cases}\]

Where:

  • \(S(t)\) is score function (score is dependent on time spent revising and practicing)
  • \(t\) is time in hours
t <- seq(0, 12.5, 0.001)
t1 <- t[t >= 0 & t < 5]
t2 <- t[t >= 5 & t < 10]
t3 <- t[t >= 10 & t <= 12.5]
t4 <- c(12.501, 14, 15, 16)

plot(c(0, 17), c(0, 101), type = "n", xlab = "Time (hours)", ylab = "Estimated Score")
lines(t1, t1*8, col = 4)
lines(t2, t2*6, col = 4)
lines(t3, t3*8, col = 4)
lines(t4, c(100, 100, 100, 100), col = 4)
points(c(0, 5, 10, 12.5), c(0*8, 5*6, c(10, 12.5)*8), pch = 21, bg = 4)
points(c(5, 10), c(5*8, 10*6), pch = 21)
title("Piecewise-defined functions")

Note, coloured dot means dot is part of graph while transparent dot means dot is not part of graph.

A.12.4 Linear Functions

A.12.4.1 Intercepts

Intercepts are point at which graph of a function crosses it’s axis. Point at which a graph crosses \(x\) axis is referred to as x intercept and where it crosses it’s \(y\) axis is referred to as y intercept.

For example, if graph of a function crosses x axis with x vlaue 2, then 2 is that graph’s x-intercept. If a graph of a function crosses y axis with y value 3, then 3 is that graph’s y-intercept.Note if y-intercept exists, then 0 is in domain of that function and y-intercept can referred by \(h(0)\) for function \(h\).

A.12.4.2 Linear functions, equations and Inequalities

Linear functions produce graphs with a straight line. These functions are expressed as:

\[f(x) = mx + b \qquad{} m \neq 0\]

Where:

  • \(m\) and \(b\) are \(\mathbb{R}\)
  • Domain (\(x\)) is a set of all real numbers \(\mathbb{R}\)
  • Range (\(f(x)\)) is also a set of all real numbers \(\mathbb{R}\)

A constant function is one where \(m = 0\), hence it’s function is given by:

\[f(x) = b\]

Where:

  • Domain is a set of all real numbers \(\mathbb{R}\)
  • Range is a constant \(b\)

Both linear and constant functions are first degree polynomials and also known as first-degree functions.

Note:

  • Since a function cannot have a domain (\(x\)) with two ranges (\(y\)), then graphs of functions cannot produce a vertical line.
  • A constant function produces produces a horizontal straight line

This means a linear function is a straight line that is neither horizontal nor vertical.

A linear function can have more than one variable. A linear function with two variable is written as:

\[\text{Standard Form } Ax + By = C \]

Where

  • \(A\), \(B\) and \(C\) are real constants
  • \(A \text{ and } B\) are not both 0

Standard form can be rewritten as a linear function:

\[y = -\frac{A}{B}x + \frac{C}{B}\]

A.12.4.3 Slope of a line

Slope is a measure of steepness of a line relative to \(x\), more specifically it is a ratio of change in \(y\) and change in \(x\). That is, given two points \(P_1(x_1, y_1)\) and \(P_2(x_2, y_2)\), slope is:

\[m = \frac{y_2 - y1}{x_1 - x_2} \qquad{} x_1 \neq x_2\]

Where \(m\) denotes slope

Change in \(x\) or horizontal change is also called Run while change in \(y\) or vertical change is also called Rise. Slope can thus be defined as:

\[m = \frac{\text{Vertical } \Delta { (Rise)}}{\text{Horizontal } \Delta { (Run)}}\]

Where \(\Delta\) means change

Slope can be:

  • Positive if line is diagonal and \(y\) is increasing as \(x\) increases
  • Negative if line is diagonal and \(y\) is decreasing as \(x\) increases
  • Zero (0) if line is horizontal
  • Undefined if line is vertical

It can be shown that, given a linear equation \(y = mx + b\), slope is equal to \(m\) while \(b\) is it’s \(y\) intercept, this is referred to as a slope-intercept form.

If we know coordinates of two point of a line or if we know a line’s slope and coordinate of a point, we can find it’s equation using what is called a point-slope form. Point-slope form for a line with slope \(m\) that passes through (\(x_1, x_2\)) is:

\[\frac{y - y_1}{x - x_1}=m \qquad{} \therefore y - y_1 = m(x - x_1)\]

A.12.5 Quadratic Functions

Quadratic functions are second degree functions, they can be defined as:

\[f(x) = ax^2 + bx + c \qquad{} a \neq 0\]

Where \(\text{a, b and c}\) are real numbers and domain is a set of all \(\mathbb{R}\) numbers.

Quadratic function produce a “u or inverted u” shaped-like graphs called parabola.

As an example, let us plot \(f(x) = x^2 + 5x - 3\) and determine it’s intercepts.

x <- seq(-5, 9, 0.001)
fx <- function(x) -x^2 + 5*x + 3

plot(c(-5, 10), c(-10, 10), type = "n", xlab = "x", ylab = "f(x)")
lines(x, fx(x), col = 4)
title(expression(paste("f(x) = ", -x^2 + 5*x + 3)))
abline(h = 0, v = 0, lty = 2)
points(c(-0.54, 0, 5.54), fx(c(-0.54, 0, 5.54)), pch = 21, bg = c(4, 5, 4))
legend("topright", legend = c("x-axis", "y-axis"), pch = 21, pt.bg = 4:5)

Y intercept is value of \(y\) as graph crosses y-axis, at this point \(x = 0\), therefore for this graph, y-intercept is \(f(0) = 0^2 + 5(0) + 3 = 3\).

X intercept is where \(f(x) = 0\) and we can solve for \(x\) using Quadratic formula.

Our input values for this formula are:

  • \(a\) = -1
  • \(b\) = 5
  • \(c\) = 3

\[\therefore x = \frac{-(5) \pm \sqrt{5^2 - 4(-1)(3)}}{2(-1)}\]

Computing all this we get:

\[x \approx -0.54 \text{ | } x \approx 5.54\]

We conclude by saying, this parabola (graph of our quadratic equation) has two x-intercepts at about \((-0.54, 0)\) and about \((5.54, 0)\) and one y-intercept at \((0, 3)\).

A.12.5.1 Properties of quadratic functions and their graphs

From a quadratic function

\[f(x) = ax^2 + bx + c \qquad{} a \neq 0\]

transforming it to

\[f(x) = a(x - h)^2 + k\]

can present a number of useful properties. We discuss these properties below, but first we see how to transform this quadratic function.

Transforming Quadratic Functions

Let us transform our earlier example

\[f(x) = 2x^2 + 8x + 16\]

to \(f(x) = a(x - h)^2 + k\).

We begin by partitioning terms with an \(x\) using parenthesis, this is what we want to convert to a perfect square.

\[(2x^2 + 8x) + 16\]

Next we want to factor out coefficient of \(x^2\) which is 2.

\[2(x^2 + 4x) + 16\]

we want to make what is in parenthesis a perfect square hence we need to use concept of completing squares to get last value \(c\). Whatever value we get, we need to subtract it outside our brackets for algebraic balance.

\[2(x^2 + 4x + \_) + 16 - \_\]

We know to complete this square we need to take half of our second term and square it, thus \((\frac{4}{2})^2 = 4\). Note, this value 4 is multiplied by 2 (what we factored out of \(a\)) making it 8. So we subtract this 8 from 16 to get:

\[2(x^2 + 4x + 4) + 8\]

Now we can transform what is within parenthesis to be a perfect square;

\[2(x + 2)^2 + 8\]

To make it similar to \(f(x) = a(x - h)^2 + k\) we minus a negative \(2\).

\[2(x - (-2))^2 + 8\]

1. Determining direction and breadth of a parabola

We know a quadratic function has a curve or a parabola graph which faces upwards or downwards. We can determine it’s direction by looking at it’s initial coefficient \(a\). If \(a\) is positive, then it is facing upwards, if it negative then it is facing downwards. For example, from our earlier quadratic equation \(f(x) = 2x^2 + 8x + 16\), since \(a = 2\), then would know it is facing upwards even without plotting it.

\(a\) can also tell us if our parabola is narrow or wide. If |\(a\)| is greater than 1, then parabola is narrower as it is increasing rapidly like 2 times if \(a\) is 2 or -2. If |\(a\)| is less than 1 like -0.5 or 0.5, then parabola would be wider as it is decreasing or increasing at a slower rate.

2. Determining presence and number of x-intercepts

From a quadratic function’s discriminant, we can tell if it’s graph has an “x-intercept” or not, we can also determine if there is one or two. For \(b^2 - 4ac < 0\) graph has no “x-intercept”, for \(b^2 - 4ac = 0,\) there is exactly one “x-intercept”, and for \(b^2 - 4ac > 0\), graph has two “x-intercepts”.

Locating parabola’s vertex

Each parabola has a lowest point if it is facing upward and highest point if it is facing downward, this point is called a Vertex.

If we can intuitively reason from our second form of a quadratic function \(f(x) = a(x - h)^2 + k\), we can either add or subtract from \(k\), that is, if initial term \(a(x - h)^2\) is positive, then we would be adding to \(k\), if it is negative then we would be reducing from \(k\) and if it is 0, then \(f(x) = k\).

With that in mind, given a range of values (domain/x) we can either be adding or subtracting from \(k\) meaning from \(k\) we can either be increasing (if positive) or decreasing (if negative). \(k\) is therefore our maximum point if it is a downward facing parabola or minimum point if it is an upward facing parabola.

To make \(f(x) = k\), we need to equate our first term (\(a(x - h)^2\)) to 0. We do this by making \(x = h\). That is:

\[f(h) = a(h - h)^2 + k = k\]

At this point \(f(h) = k\) parabola is at it’s minimum if it’s upward facing and \(a > 0\) or maximum if it’s downward facing and \(a < 0\).

Therefore vertex point is given by (\(h, k\)).

For our example with equation \(2(x - (-2))^2 + 8\), vertex point is (\(-2, 8\)).

m <- -15:12
jan <- function(m) 2*(m - (-2))^2 + 8
plot(m, jan(m), type = "l", col = 4)
points(-2, 8, pch = 21, bg = 4)
text(-2, 50, "Vertex")

Line of symmetry

On one side of a vertex, parabola would increasing and on other side it would be decreasing. This means picking any two points on opposite sides with same distance from a vertex would have similar \(f(x)\).

For example moving 8 points above and 8 points below gives us;

\[f(-10) = 2(-10 - (-2))^2 + 8 = 136\]

\[f(6) = 2(6 - (-2))^2 + 8 = 136\]

If we drew a line passing through a vertex, it would split a parabola into two, this line of symmetry is referred to as axis.

plot(m, jan(m), type = "l")
pts <- c(-10, -2, 6)
points(c(-10, -2, 6), jan(pts), pch = 21, bg = c(5, 4, 6))
lines(c(-2, -2), c(0, 370), lty = "dashed")
text(-2, 390, "Axis")
mtext("vertex", 1, at = -2)

A.12.6 Exponential functions

Following up on natural exponents and basic functions, we know that

\[f(x) = 2^x\]

and

\[g(x) = x^2\]

are not similar.

\(f(x)\) is an exponential function and \(g(x)\) is a square function.

Generally we can write an exponential function as:

\[f(x) = b^x \qquad{} b > 0, \space{} b \neq 1\]

Domain for this function is a set of all real \(\mathbb{R}\) numbers and range is set of all positive real numbers. In this function, we are excluding base 1 as it outputs a constant. We are also excluding zero as anything less than 0 (negative values) will not output a real number.

x <- seq(-5, 5, 0.001)
y <- 2^(-x)
plot(x, y, type = "l", col = 4, xlab = "x", ylab = "y")

If we plot \(f(x) = 2^x \qquad{} b > 1\) and it’s inverse \(f(x) = b^{-x}\), their graphs will be a reflection of each other across y-axis.

y1 <- 2^x
y2 <- 2^(-x)
plot(c(-6, 6), c(0, 33), type = "n", xlab = "x", ylab = "y")
abline(v = 0)
for(i in seq(5, 30, 5)){
   segments(-0.1, i, 0.1, i)
}
lines(x, y1, col = 4)
lines(x, y2, col = 5)

From this graphs it is clear to see that they both pass through point(0, 1). Reason for this is because minimum positive exponent is 0 and as a exponent to any base would output 1, that is \(b^0 = 1\).

We generate other useful properties for graphs of base greater than 1. These include:

  • All graphs of exponent functions pass through point (0, 1)
  • These graphs are continuous curves with no holes or jumps
  • As curve of this graph approaches x-axis heading towards infinity it thins out but does not reach x-axis (horizontally aligned to x-axis), this line is referred to as an asymptote.

A.12.6.1 Base e Exponential Function

Any real number can be used as a base in exponential functions, however, there is one particular base that is extensibily used in a number of mathematical expressions and formulas modelling real-worlds phenomena, this is base \(e\).

Base \(e\) is an irrational number named after Leonhard Euler (1707-1783). As with other irrational numbers base \(e\) cannot be represented with any finite decimal fraction but it can be approximated.

Base \(e\) can be computed with expression:

\[1 + (\frac{1}{x})^x\]

Using this expression we can compute some values of \(e\)

\(x\) \((1 + \frac{1}{x})^2\)
1 2
100 2.7048138
10,000 2.7181459
100,000 2.7182682
1,000,000 2.7182805

Continuing with this computation and as \(x\) increases, output approaches an irrational number 2.7183 but does not really reach there, instead it’s decimal notation continues on. This number is what is called \(e\), \(e\) to 15 decimal places is

Let’s plot exponential function with base \(e\) and it’s reciprical \(1/e\) and see how they reflect each other across y-axis.

x <- seq(-4, 4, 0.001)
y1 <- exp(x)
y2 <- exp(-x)

plot(c(-5, 7), c(0, 56), type = "n", xlab = "x", ylab = "y")
abline(v = 0)
for(i in axTicks(2)) {
   segments(-0.1, i, 0.1, i)
}
lines(x, y1, col = 4)
lines(x, y2, col = 5)
legend("topright", legend = c(expression(e^x), expression(e^-x)), lty = 1, col = 4:5)

Other than base \(e\), base 10 and base 2 are often used. Base 10 is often used as it corresponds to our base-10 number system.

A.12.7 Logarithimic functions

Logarithms are widely used concept in many disciplines, in statistics, it is often used to scale down (reduce) a wide range of quantities to smaller scopes.

Domain of logarithms cannot be negative as there is no base which can be raise to any number to produce a negative value

log(-100)
## [1] NaN

Before discussing logarithmic functions, we need to understand two concepts, these are one-to-one functions and inverse of a function.

A one-to-one function is a function whose range correspond to exactly one domain. That means no two domains (\(x\)) correspond to same range (\(f(x)\)). So for a continuous function that is either increasing or decreasing for all it’s domain values, then it can be called a one-to-one function.

As an example, let us look at these two functions:

\[f(x) = 2^x\]

and

\[g(x) = x^2\]

\(f\) is a one-to-one function while \(g\) is not. This is because for any integer range for function \(f\), there can only be one domain, while for function \(g\) there can be two domains. Like if we had a range of 16, domain in function \(f\) would be 4, that is \(f(4) = 2^4 = 16\). In function \(g\) for a range of 16, we would have two possible domains, 4 and -4 that is, \(g(4) = 4^2 = 16\) as well as \(g(-4) = -4^2 = 16\).

An inverse of a function is a function formed when right side of a one-to-one function is interchanged with that of the left hand side. That means, if point (x, y) is on a one-to-one function, then it’s inverse would be (y, x). Inverse only apply’s if function is a one-to-one function.

Logarithmic functions are inverse of exponential functions. Therefore, given:

\[y = 2^x\]

it’s inverse is:

\[x = 2^y \text{ or } y = log_2x\]

This equation finds how many times 2 is raised (multiplied by it’s self) to get \(y\). This is a logarithmic equation with base 2. It is read as “logarithm of \(y\) to base \(b\)”.

This is only when \(x = 2^y\) and this is what we graph.

For example, inverse of \(y = 3^x\) is \(x = 3^y\). We can plot these two equations on the same coordinate plane given a domain set of {-3,-2,-1,0,1,2,3}

x1 <- seq(-5, 5, 0.001)
y1 <- 3^x1

plot(-5:10, -5:10, type = "l", lty = "dashed", xlab = "x", ylab = "y")
abline(h = 0, v = 0, col = 8)
lines(x1, y1)
lines(y1, x1, col = 4)

Let’s look at some example of conversion from Logarithmic to Exponential

Converting logarithmic to exponential

Logarithm Exponent
1. \(log_{10}100 = 2\) \(10^2 = 100\)
1. \(log_{2}8 = 4\) \(2^4 = 8\)
1. \(ln 7.389056\) \(e^2 = 7.389056\)

A.12.7.1 Properties of Logarithimic Functions

There are a number of handy properties for logarithmic functions these are:

  1. \(log_b 1 = 0\)
  2. \(log_b b = 1\)
  3. \(log_b b^x = x\)
  4. \(log^{log_bx} = x, \quad{} x> 0\)
  5. \(log_b MN = log_bM + log_bN\)
  6. \(log_b \frac{M}{N} = log_bM - log_b N\)
  7. \(log_bM^p = \text{p } log_bM\)
  8. \(log_bM = log_bN \text{ if and only is } M = N\)

Just like exponential functions, base \(e\) and 10 are the most frequently used logarithmic bases. Base 10 logarithm is called Common logarithms while base \(e\) is called Natural logarithms.

In most calculators, often used logarithm is base 10 and it is labeled “log” while base \(e\) or natural logarithm is labeled “ln” or “LN”. Hence notation form is:

Common logarithms: \(log \text{ x} = log_{10} x\) Natural logarithm: \(ln\text{ x} = log_e x\)

In R, function log() computes natural logarithm by default but it can also be used for other bases. Function log10() and log2() are specific for base 10 and base 2.

A.13 Introduction to Calculus

In this section we aim to discuss an important mathematical topic referred to as Calculus. Calculus will become handy in plotting graphs of different models and computing area under a curve.

Generally, calculus involves change; how much something has changed. It has two core topic, these are derivatives and integrations. While derivative divides an object into smaller pieces to find change, integration joins these small pieces to find total change. Derivatives and integration are inverse of one another.

A.13.1 Derivative

As noted, derivatives involves diving an object into smaller piece so as to find it’s change. Question is when is this applicable and how do we go about it.

Think of anything with a set of continuous values, for example a jogger could have an average jogging speed of say 6 kph (Kilometer per Hour) but actual speed would vary across jogging track, hence if asked speed at any particular point, it might vary from this average speed. With this in mind, to establish speed at any particular point (distance) in time, we need to make some point estimates or computation. To get speed at a particular point in time might not be possible and we can reason this out.

Suppose we want to know speed at which our jogger has been jogging between her second kilometer with a time of 1/4hr and third kilometer with a time of 1/2hr. Given speed is computed as change in distance over change in speed, we can easily establish her speed between 2nd and 3rd kilometer as:

\[Speed = \frac{3\text{km} - 2\text{km}}{\frac{1}{2}\text{hr} - \frac{1}{4}\text{hr}} = 4\text{ kph}\]

Now if we wanted to know her speed at exactly 2 kilometers with same time of 1/4hr what do we expect, let us compute it we see.

\[Speed = \frac{2\text{km} - 2\text{km}}{\frac{1}{4}\text{hr} - \frac{1}{4}\text{hr}} = \frac{0}{0} = \text{undefined}\]

We have something that is undefined, therefore our next possible solution is to get small portions of distance and time covered just around our interest of 2 km at 1/4hr. That is, go backwards a small distance from 2 kilometers and slightly ahead of 2 kilometers. Difference in this change should be a good approximation of distance covered at exactly two kilometers. Core issue here is to be as close as possible to 2 kilometers on either sides without really being at exactly 2 kilometers.

Jogging Track

Jogging Track

Establishing this speed (at distance close to 2km but not being at exactly 2km) is what differentiation calculus is all about.

Our particular aim in this subsection is to compute points on curve or non linear functions. Core concepts we will be going through are:

A.13.1.1 Rate of change

In our preceding discussion, we saw how to measure change at a particular exact point. Based on this let us look at an example of a change in a quadratic function and formulate a general function for measuring change.

Suppose reproduction of a certain micro-organism can be modeled with this function (hypothetical example):

\[R(t) = t^2 \qquad{} x >0\]

Where:

  • \(R\) means reproduction
  • \(t\) is time in minutes

We are given a graph of this function which depicts reproduction of microorganism for a time between 0 and 5 minutes. Now we seek to determine number of micro-organisms reproduced between 2 and 4 minutes. Here are our two points:

x <- seq(0, 5, 0.001)
y <- x^2
plot(x, y, type = "l", xlab = "Time (minutes)", ylab = "# of Microorganisms", col = 4)
title("Reproduction of microorganisms")
points(c(2, 4), c(2^2, 4^2), pch = 21, bg = 4)
segments(c(2, 4, 2), c(2^2, 2^2, 2^2, 2^2), c(4, 4, 4), c(2^2, 4^2, 4^2), lty = 2, lwd = 2)
text(2-0.7, 2^2+0.1, labels = "(2, 4)")
text(4-0.7, 4^2+0.1, labels = "(4, 16)")
text(3, 5.1, labels = 2, cex = 0.8)
text(3.7, 9, labels = 12, srt = 90, cex = 0.8)

Number of microorganisms reproduced between 2 minutes and 4 minutes is computed as change in number of microorganisms divided by time in minutes, that is:

\[R(16) - R(4) = \frac{16 - 4}{4 - 2} = \frac{12}{2} = 6\]

We can use this example to formulate a general function for measuring change or more specifically what we refer to as average rate of change.

Given function \(y = f(x)\), when one domain \(x\) changes from \(a\) to \(a + h\) (\(h\) being our change), then \(y\) will change from \(f(a)\) to \(f(a + h)\). It’s average rate of change in \(y\) to the changes in \(x\) is therefore given by:

\[\frac{f(a + h) - f(a)}{(a + h) - a} = \frac{f(a+h)-f(a)}{h} \quad{} h \ne 0\]

This mathematical expression is referred to as a difference quotient.

Now recall our jogger example, let us use this difference quotient to establish speed at a particular point like 6 kilometers. We are told our jogger’s average speed can be modeled by:

\[Speed = x^3 -2x^2\]

Based on our earlier discussion, to establish speed at exactly 6km, we need a small change (\(h\)) before and after 6km. This change should be small enough to get as close to 6km without being at exactly 6km. We do this using small values of \(h\) (change); 0.1, 0.01 and 0.001 to compute speed before and after 6km.

##          5.9    5.99  5.999   6  6.001    6.01   6.1
## h      -0.10 -0.0100 -0.001   0  0.001  0.0100  0.10
## Speed  82.41 83.8401 83.984 NaN 84.016 84.1601 85.61

We can see as \(h\) approaches 0 on both sides, speed is approaching 84 kph. We can therefore approximate speed at exactly 6km to be about 84 kph.

##          5.9    5.99  5.999  6  6.001    6.01   6.1
## h      -0.10 -0.0100 -0.001  0  0.001  0.0100  0.10
## Speed  82.41 83.8401 83.984 84 84.016 84.1601 85.61

Mathematically we can describe this by saying:

\[84 \text{ kph is a limit for average speed as } h \text{ approaches } 0\]

Note word limit which we can describe as a value a function or sequence approaches.

Symbolically we can express it as:

\[\frac{f(6 + h) - f(6)}{h} \to{84} \quad{} \text{as} \quad{} h \to{0}\]

Alternatively we can also express it as:

\[\lim_{h \to 0} \frac{f(6 + h) - f(6)}{h} = 84\]

This value 84 is also referred to as instantaneous rate of change or in physics instanteneous velocity.

Symbolically if limit exists, we can express this instantaneous rate of change at \(x = a\) as:

\[\lim_{h \to 0} \frac{f(a + h) - f(a)}{h}\]

In most texts, instantaneous rate of change is usually shortened to just rate of change which should be distinguished from average rate of change.

A.13.1.2 Slope

Slope is also referred to as a Gradient and it’s a measure of steepness and direction of a line or part of a line connecting two points. Part of a line includes a secant which is a line passing through two points on a graph of a function.

There are a number of ways to compute slope, here we will look at two methods. One method computes slope by using difference quotient, another method computes difference between distance in \(y\) direction over that of \(x\) direction.

Let us begin by looking at slope of secant line using difference quotient.

\[\text{Slope of secant line} = \frac{f(a + h) - f(a)}{h}\]

In the following graph, slope is the steepness of secant (blue) line passing through points \((a, f(a))\) and \((a+h, f(a + h))\).

As a specific example, we can graph function \(f(x) = x^3\), with a segment passing through point (\(6, f(6)\)) and (\(6+2, f(6+2)\)), when \(a = 6\) and \(h = 2\).

and compute slope of this secant line as:

\[= \frac{f(6+2) - f(6)}{2}= \frac{8^3 - 6^3}{2}= \frac{512 - 216}{2} = 148\]

In general slope of a graph of a function \(y = f(x)\) at point (a, f(a)) is given by formula for instantaneous rate of change that is:

\[\lim_{h \to 0} \frac{f(a + h) - f(a)}{h}\]

Note: This is only possible if a limit exists (we will discuss in subsequent section instances where a limit would not exist).

When we have one point, we can determine it’s slope by computing very near values below and above 2 without really being at 2. Slope would be value we would be approaching on left and right side of 2.

Using our function and graph, let’s determine a limit of \(a = 7\) as \(h\) decreases.

y <- ((7 + h)^3 - 7^3)/h
y[4] <- ceiling(y[3])
matrix(y, nrow = 1, dimnames = list("fx", 2+h))
##       1.9     1.99   1.999   2   2.001     2.01    2.1
## fx 144.91 146.7901 146.979 147 147.021 147.2101 149.11

With that we can say 147 is our limit and an approximation of slope at this point (7, \(7^3\)). A line passing though this point (with this gradient) is called a tangent line. Basically a tangent is a line passing though exactly one point on a graph.

We can now plot our tangent with a gradient of 147 and passing through (7, \(7^3\)). With our gradient, we know each point above or below would be in steps equal to gradient, hence point below 7, that is 6, would be \(7^3 - 147\) and point above would be \(7^3 + 147\).

x <- 1:10
y <- x^3
plot(x, y, type = "l")
title(expression(paste("f(x) =", x^3)))
points(7, 7^3, pch = 21, bg = 4)
lines(6:8, c(7^3 - y[4], 7^3, 7^3 + y[4]))

Picking any two points on this line should output a gradient of 147.

The other method for computing slope as mentioned, looks at division of distance in \(y\) direction over that of \(x\) direction. That is, given two point (\(x_1, y_1\)) and (\(x_2, y_2\)) we can computer it’s slope as:

\[\text{Slope} = \frac{Rise}{Run}= \frac{y_2 - y_1}{x_2 - x_1}\]

Note, slope or gradient is denoted by letter \(m\).

Using this method, we can compute our two point (6, \(6^3\)) and (8, \(8^3\)) as:

\[m = \frac{8^3 - 6^3}{8 - 6} = \frac{512 - 216}{2} = 148\]

A.13.1.3 Limits

Limit is a value we are approaching from left and right of a domain as we make change (\(h\)) smaller and smaller. We can express a limit as:

\[\lim_{x \to c}f(x) = L\]

Or

\[f(x) \to L \quad{} \text{as } x \to c\]

We have discussed an existing limit as a value being approached as \(h\) greatly decreases, now let us look at an example where a limit does not exist.

We are given a function

\[f(x) = \frac{|x^3|}{x^3}\]

and we want to establish if

\[\lim_{h \to 0} \frac{|x^3|}{x^3}\]

exists or not.

Here we want to look at what values we get for \(f(x)\) when \(h\) approaches 0 but not exactly at 0 itself.

Using these values as our \(h\)

## -2 -1 -0.1 -0.01 -0.001 0 0.001 0.01 0.1 1 2

We can present our computed \(f(x)\) in this table:

##      -2 -1 -0.1 -0.01 -0.001   0 0.001 0.01 0.1 1 2
## f(x) -1 -1   -1    -1     -1 NaN     1    1   1 1 1

As expected, there is no value of \(f(x) = 0\), but what is of interest are our two values on left and right of 0. These values approach two different numbers, left side is -1 and right side is +1. Clearly there is a disjoint here.

We refer to -1 as limit from the left and +1 as limit from the right, these two form concept of one-sided limits.

For our example, since one-sided limits are not near any one specific number, then we conclude that there exists no limit at \(x = 0\). But we acknowledge that there exists a left and right limits. That is:

\[\lim_{x \to 0} \frac{|x^3|}{x^3} \quad{} \text{does not exist}\]

We can graphically represent this as:

plot(c(-2.2, 2.2), c(-1.5, 1.5), type = "n", xlab = "x approaching 0", ylab = "")
abline(v = 0, h = 0, col = 8, lty = "dashed")
lines(h[1:5]-0.003, fx[1:5], col = 4)
points(c(-0.001, 0.001), c(-1, 1), pch = 21, cex = 0.7, col = 4)
lines(h[7:length(h)]+0.005, fx[7:length(fx)], col = 4)
title("Limit does not exist")

From this graph we not only see a disjoint but we also see \(f(x)\) at 0 does not exist (blank point). Recall a transparent point indicates point is not given graph.

Generally, for a limit to exist, left and right limits must exist and they must be approaching similar value.

Symbolically, we can represent left-hand limit as:

\[\lim_{x \to c^{-}} f(x) = K\]

Where

  • \(K\) is left-hand limit
  • \(x \to c^{-}\) means \(x\) approaches \(c\) from left side and
  • \(x < c\)

For example,

\[\lim_{x \to 0^{-}} \frac{|x^3|}{x^3} = -1\]

We can express right-hand limit as:

\[\lim_{x \to c^+} f(x) = L\]

Where

  • \(L\) is right-hand limit
  • \(x \to c^{+}\) means \(x\) approaches \(c\) from right side
  • \(x > c\)

For example

\[\lim_{x \to 0^{+}} \frac{|x^3|}{x^3} = 1\]

A.13.1.3.1 Properties of limits

Given two functions \(f\) and \(g\) with limits \(L\) and \(M\) existing and are real numbers \(\mathbb{R}\):

\[\lim_{x \to c} f(x) = L \qquad{} \qquad{} \lim_{x \to c} g(x) = M\]

then

  1. Limit of an identity is equal to identity: \(\lim_{x \to c}x = c\)
  2. Limit of a sum is equal to sum of the limits \(\lim_{x \to c} [f(x) + g(x)] = \lim_{x \to c} = L + M\)
  3. Limit of a difference is equal to difference of the limits \(\lim_{x \to c} [f(x) - g(x)] = \lim_{x \to c}f(x) - \lim_{x \to c}g(x) = L - M\)
  4. Limit of a function with a coefficient is equal to product of coefficient and limit: \(\lim_{x \to c} kf(x) = k \lim_{x \to c} f(x) = kL\) for any constant \(k\)
  5. Limit of a product is equal to products of all limits: \(\lim_{x \to c}[f(x).g(x)] = [\lim_{x \to c}f(x)][\lim_{x \to c}g(x)] = LM\)
  6. Limit of a quotient is equal to division of the limits as long as denominator is not 0: \(\lim_{x \to c}\frac{f(x)}{g(x)} = \frac{\lim_{x \to c}f(x)}{\lim_{x \to c}g(x)} = \frac{L}{M} \quad{} \text{if } M \neq 0\)
  7. Limit of an nth-root is nth-root of that limit as long as limit is greater than 0 for even roots: \(\lim_{x \to c} \sqrt[n]{f(x)} = \sqrt[n]{\lim_{x \to c}f(x)} = \sqrt[n]{L} \quad{} L > 0 \text{ for n even}\)

Using these properties, let us compute some limit:

\[\lim_{x \to 2}(2x^2 - x)\]

From first property we know

\[\lim_{x \to 2} x = 2\]

We also know when a term has a coefficient we compute limit then multiply with coefficient, thus

\[\lim_{x \to 2}2x^2 = 2 . 2^2 = 2 . 4 = 8\]

\[\therefore \lim_{x \to 2}(2x^2 - x) = 8 - 2 = 6\]

From these examples, we can build a more generalized limit of a function \(f(x)\). That is, for any real value \(\mathbb{R}\) \(c\), we can write this limit

\[f(x) = \lim_{x \to c}(x^3 + 2x)\]

as

\[\lim_{x \to c}(c^3 + 2.c)\] which function \(f\) at c; \(f(c)\)

\[\therefore f(x) = \lim_{x \to c}(x^3 + 2x) = \lim_{x \to c}(c^3+2.c) = f(c)\]

We can use this knowledge to evaluate any limit. For example, for a polynomial function of the form:

\[f(x) = a_nx^n + a_{n-1}x^{n-1} + ... + a_0\]

limit of \(f(x)\) is given by

\[\lim_{x \to c}f(x) = \lim_{x \to c}(a_nx^n + a_{n-1}x^{n-1} + ... + a_0)\]

\[= a_nc^n + a_{n-1}c^{n-1} + ... + a_0 = f(c)\]

We can therefore conclude by saying, limit of a polynomial function where \(c\) is any real number is expressed as

\[\lim_{x \to c} f(x) = f(c)\]

A.13.1.3.2 Limits of Difference Quotients

Given a function, we can determine it’s limit using difference quotient. For example given:

\[f(x) = 3x + 2\]

and difference quotient of \(x = 2\)

\[\lim_{h \to 0} \frac{f(2 + h) - f(2)}{h}\]

We can determine limit of this function as

\[\lim_{h \to 0} f(x) = \frac{[3(2+h)+2] -[3(2)+2]}{h} = \frac{(6+3h+2)-(6+2)}{h} = \frac{3h}{h}\]

\[\therefore \lim_{h \to 0} f(x) = 3\]

Slope of a graph

Now recall how we determined slope of a tangent line, we estimated it’s limit by getting smaller values of \(h\). Our \(h\) or change was approaching 0 without it being at exactly 0. Given what we have learnt with limits, we can now compute slope of any point without going through process of reducing \(h\) by using difference quotient.

For example given function \(y = f(x) = x^3 - 5x\) and point (3, 12), we can determine slope of it’s tangent using difference quotient as follows:

\[\lim_{h \to 0} \frac{f(3+h) - f(3)}{h}\]

Substituting with function to solve for \(y\) we get:

\[\lim_{h \to 0} \frac{[(3 + h)^3 - 5(3+h)] - [3^3 - 5(3)]}{h}\]

From our binomial expansion section, we saw how to expand \((3+h)^3\) to \(27 + 27h + 9h^2 + h^3\), if we substitute it in our equation and make necessary computations we get:

\[\lim_{h \to 0} \frac{27 + 27h + 9h^2 + h^3 - 15 - 5h - 27 + 15}{h}\]

We can now simplify this to:

\[\lim_{h \to 0} \frac{22h + 9h^2 + h^3}{h}\]

Factoring our \(h\) in our numerator we get:

\[\lim_{h \to 0} \frac{h(22 + 9h + h^2)}{h}\]

\(h\) and \(h\) cancel each other outputting \(\lim_{h \to 0} (22 + 9h + h^2)\), which means slope is 22.

We can confirm this by our earlier method of reducing \(h\).

yfx <- (((3+h)^3 - 5*(3+h)) - (3^3 - 5*3))/h
yfx[4] <- ceiling(yfx[3])
matrix(yfx, nrow = 1, dimnames = list("fx", 3+h))
##    1  2   2.9 2.99  2.999   3  3.001    3.01   3.1  4  5
## fx 8 14 21.11   22 21.991 NaN 22.009 22.0901 22.91 32 44
cat("\n", yfx[4], "is our limit as h approaches 0 and f(x) approaches 3 \n")
## 
##  22 is our limit as h approaches 0 and f(x) approaches 3

A.13.1.4 Derivative

Given our preceding discussion, we can now locate slope of any secant or tangent line. For tangent line, we need to establish it’s limit which we agreed would suffice as an approximation of it’s slope. So for each point we have to compute a limit to get it’s slope.

Now let’s consider a more efficient way to determine limit of any point given a function. We want to generalize one limit which we can use to determine slope of any point on graph. By doing this we are not only making this process efficient but we are also establishing a relationship between a function and slope of a tangent line at any point on the graph of that function.

For example, if we had a graph \(y = f(x) = x^2\), we can establish limit for any point \(a\) using a two step process. Initial step involves getting slope of a secant line using difference quotient and step two involves getting slope of a tangent.

Step 1

\[\frac{f(a + h)-f(a)}{h} = \frac{(a+h)^2-a^2}{h} = \frac{a^2+2ah+h^2-a^2}{h}\]

\[\therefore \frac{f(a+h) -f(a)}{h} = 2a + h \qquad{} h \ne 0\]

Step 2: Limit of difference quotient

Slope of any tangent line is given by:

\[= \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} = \lim_{h \to 0}(2a+h) = 2a\]

Therefore for any graph of \(y = f(x) = x^2\), slope of any tangent line on this graph would be given by \(2a\).

Let us try to get slope of any tangent line with graph \(y = g(x) = x^3\).

Step 1

\[\frac{f(a+h)-f(a)}{h}= \frac{(a+h)^3-a^3}{h}= \frac{a^3 +3a^2h+3ah^2+h^3-a^3}{h}\]

\[\therefore \frac{f(a+h)-f(a)}{h} = 3a^2+3ah+h^2 \qquad{} h \ne 0\]

Step 2

Slope of a tangent line of this graph is given by:

\[\lim_{h \to 0}\frac{f(a+h)-f(a)}{h}= \lim_{h \to 0}(3a^2 + 3ah+h^2)= 3a^2\]

What we can be note in both \(f(x) = x^2\) and \(g(x) = x^3\) is that slope of any tangent line on their graph is a function of \(a\). \(a\) being point of tangency (point on on a graph/curve).

We know how to generalize this process of determining slopes of tangent lines along a graph, now let us establish relationship between this generalized slope with function of it’s graph.

Relationship between slopes of tangent lines along a graph and function of it’s graph

Given this plot

Let’s determine slope of points with x value -4, -3, -2, -1, 0, 1, 2, 3 and -4.

Since this is a square function, then we expect slope at any point on this graph to be \(2a\) or in this case \(2x\). That means graph’s steepness at each point is twice it’s domain at that point.

For our graph, although we know it is a square function, we cannot be certain by mere observation that steepness at each point on this graph is twice that of it’s domain at that point, for example, look at these three graphs, they are all square functions but they all have different slopes.

x <- seq(-4, 4, 0.001)
y1 <- function(x) 2*x^2
y2 <- function(x) x^2
y3 <- function(x) 0.5*x^2
plot(c(-5, 5), c(0, 40), type = "n", xlab = "x", ylab = "y")
lines(x, y1(x), col = 4)
lines(x, y2(x), col = 5)
lines(x, y3(x), col = 6)
legend("topright", legend = c(expression(2*x^2), expression(x^2), expression(0.5*x^2)), title = "f(x) =", lty = 1, col = c(1, 4, 5), cex = 0.8)

Slope in this case is determined by coefficient of \(x^2\) which we referred to as \(a\). When this coefficient is greater than one like \(2x^2\), then graph is much steeper and when it is less than 1 like \(0.5x^2\), then it is less steeper.

So we need to establish a function for any slope of a tangent lines along our graph. To do this we use one vital clue from our graph, our vertex or minimum point (0, -2). We can use this point to determine general slope of tangent lines along our graph using our square form of a quadratic equation, that is \(a(x-h)^2 + k\).

From our discussion on vertices, we know \(h = k\) and \(k\) is distance from x-axis to minimum point, that is \(y\) which is -2. Given this fact and x being 0, then our equation is now:

\[a(0+(-2))^2 - 2 = 4a - 2\]

Taking 2 to the other side we get:

\[4a = 2\]

Dividing both sides with 4 we get:

\[a = \frac{2}{4} = 0.5\]

Therefore general slope for our graph is not standard \(2a\) but rather a fraction of it, that is \(0.5a\).

We can now compute slope for given \(x\) values:

x Slope of Tangent: line at (x, f(x))
-4 -2
-3 -1.5
-2 -1
-1 -0.5
0 0
1 0.5
2 1
3 1.5
4 2

Given general slope of tangent lines along a graph like \(0.5a\) we want to determine function of a graph. In other words, we what to establish relationship of a slope’s function to that of a graph’s function.

A general slope’s function like \(0.5a\) tells us how steep our graph will be, so given that standard square function \(f(x) = x^2\) has a general slope of \(2a\), then any other general slope of a square function will be a fraction of this (less steeper) or multiple of this (more steeper).

In that regard our graph’s steepness will be a fraction of 2, that is 0.5/2 which makes it 0.25.

We can now proceed to formulate our graph’s function in the form:

\[f(x) = a(x + h)^2 + k\]

As discussed, \(a\) tells us how steep our graph is and we have determined it to be 0.25. \(h\) and \(k\) as mentioned during our discussion on vertex are horizontal and vertical transformation of a vertex, that is distance from zero for x and y axis. Looking at our graph, we have no horizontal transformation as \(x = 0\), but we have a vertical transformation as \(y = -2\).

We can therefore compile our function as:

\[f(x) = 0.25x^2 - 2\]

If we want to confirm this function has a general slope of \(0.5a\), we compute slope of a tangent line at \(x\) by evaluating

\[\lim_{x \to 0} \frac{f(x+h) - f(x)}{h}\]

We begin by substituting with our function:

\[\lim_{x \to 0} \frac{[0.25(x+h)^2 - 2] - [0.25x^2 - 2]}{h}\]

\[= \frac{[0.25x^2 + 0.25(2xh) + 0.25h^2 - 2] - [0.25x^2 - 2]}{h}\]

\[= \frac{h(0.5x + 0.25h)}{h} = 0.5x + 0.25h\]

Therefore tangent line at any point x along this graph would have a slope of \(0.5x\). Byt that we have established a relationship between general slope of tangent line along a graph and it’s graph’s function.

A.13.1.4.1 Definition of a derivative function

In basic terms derivative means getting limit of a point along a graph. In notation form we define derivatives for \(y = f(x)\) at \(x\) as:

\[f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}\]

This is if a limit exists.

\(f`(x)\) is used to denote a derivative function. \(f`\) is a new function with it’s domain being a subset of domain of \(f\). This function (\(f'\)) is used to estimate slope of tangent lines and instantaneous rate of change.

\(h \to 0^{-}\) and \(h \to 0_{+}\) are used to indicate left and right differentiation.

As an example, let’s determine derivative of \(f(x) = 3x^2 + 5\) at \(x\). To do this we begin by getting slope of secant with point (x, f(x)) and (x+h, f(x+h)) then slope of tangency point. Initial step thus involves using difference quotient and simplifying it while second step involves getting a limit for difference quotient.

\[\frac{f(x+h) - f(x)}{h} = \frac{[3(x+h)^2 + 5] - [3x^2 + 5]}{h}\]

Since \((x+h)^2 = x^2 + 2xh + h^2\) then

\[\frac{[3(x^2 + 2xh + h^2) + 5] - 3x^2 - 5}{h} = \frac{3x^2+6xh+2h^2+5-3x^2-5}{h}\]

This leads us to

\[\frac{6xh+2h^2}{h} = \frac{h(6x+2h)}{h} = 6x + 2h\]

which becomes slope of our secant line. Now we can determine slope of tangent line with tangency point on graph.

\[f'(x) = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} = \lim_{h \to 0}(6x + 2h) = 6x\]

With that we note slope of a tangent line along graph of \(f\) or at any point \((x, f(x))\) is:

\[m = f'(x) = 6x\]

Therefore, given \(x = 2\), we can compute slope of a tangent line at that point as:

\[m = f'(2) = 6(2) = 12\]

A.13.1.4.2 Non-existance of a derivative

Non-differentiability means limit does not exist at given point, like derivative might not exist at \(x = a\) if \(x = a\) does not have a limit. This can be expressed by derivative:

\[f'(a) = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h}\]

A.13.1.5 Denoting derivatives

There are two other ways of denoting derivatives of \(f\) at \(x\) other than \(f'(x)\), these are:

\(y'\) and \(\frac{dy}{dx}\)

We will use these symbols at some point given situation but overall they do mean the same thing.

A.13.1.6 Derivatives of Constants, Exponential Forms and Sums

In this section we go over a few ways of getting derivatives of most often used functions.

A.13.1.6.1 Derivative of a constant function

A constant function as mentioned earlier is expressed as:

\[f(x) = C\]

Where \(C\) is a constant.

Graph of this function is a horizontal line with slope = 0. Given slope = 0, then \(f'(x) = 0\). We can show this using our two step process we have been using to determine derivative.

\[f(x) = \frac{f(x+h) - f(x)}{h} = \frac{C - C}{h} = \frac{0}{h} = 0 \quad{} h \neq 0\]

\[\lim_{h \to 0} 0 = 0\]

\[\therefore f'(x) = 0\]

We can also express derivative of any \(y = f(x) = C\) as:

\[y' = 0 \quad{} \text{ and } \quad{} dy/dx = 0\]

Sometimes we can note \(y\) as \(C\) like:

\(C' = 0\) or \(\frac{d}{dx}C = 0\) which all mean \(y' = frac{dy}{dx}=0\)

Here are some examples of derivatives of constants showing how different derivative notations are used.

In each of the following functions, determine their derivatives

Function Derivative
1. \(f(x) = 9\) \(f'(x) = 0\)
2. \(y = e\) \(y' = 0\)
3. \(y = -2\) \(dy/dx = 0\)
4. \(\frac{d}{dx}12 = 0\)

This what is referred to as derivative of a constant function rule. Derivative rules are basically functions for computing derivatives of functions.

A.13.1.6.2 Exponential Rule

Exponential functions are functions expressed as \(f(x) = x^2\) with \(k\) being a real number \(\mathbb{R}\).

Examples of Exponential rule include:

\[f(x) = x \qquad{} h(x) = x^2 \qquad{} m(x) = x^3\]

Root functions like \(\sqrt{x}\) and \(sqrt[3]{x}\) are also Exponential function because nth-root is basically \(1/n\) for example square root of 16 is similar to 16 raised by one half.

\[\sqrt{16} = 16^{1/2} = 4\]

Domain of a Exponential function depends on exponent as it (domain) will be a multiple of this exponent.

In our derivative section we used a two step process to generalize a slope function which we used to determine derivative of two functions; \(f(x) = x^2\) and \(g(x) = x^3\), this gave us \(2a\) and \(3a^2\).

If we went ahead and generalized slopes of next two exponents then:

for \(f(x) = x^4\) derivative would be \(f'(x) = 4x^3\) and \(f(x) = x^5\) derivative would be \(f'(x) = 5x^4\)

With that we have an interesting pattern as coefficient of \(a\) is similar to exponent of it’s function and exponent of this generalized slope is one less the exponent of it’s function. That is:

If \(f(x) = x^n\), then \(f'(x) = nx^{n-1}\)

We can replicate this with our other derivative notation.

\(y' = nx^{n-1}\) and \(dy/dx = nx^{n-1}\)

This is what we call a Exponential Rule. Therefore a Exponential rule is a formula for computing derivatives of Exponential functions.

Here are some interesting examples on using our Exponential rule formula.

  1. We want to get derivative of \(f(x) = x\).

\[f'(x) = 1x^{1-1} = x^0 = 1\]

We can confirm this with algebraically as:

\[\frac{(x+h) - x}{h} = \frac{x+h-x}{h}=\frac{h}{h}=1\]

  1. Let us get derivative of \(y = x^{^-9}\)

\[y' = -9x^{-9-1} = -9x^{-8} \quad{} \text{or} \quad{} -\frac{9}{x^{8}}\]

  1. Now let us get derivative of \(y = x^{3/2}\)

\[\frac{dy}{dx} = \frac{3}{2}x^{\frac{3}{2} - \frac{2}{2}} = \frac{3}{2}x^{1/2}\]

A.13.1.6.3 Derivative of a constant times a function

Given function \(f(x) = kg(x)\) where \(k\) is a constant and \(g(x)\) is a function differentiable at \(x\). Then we can find it’s limit as:

\[\frac{f(x+h) - f(x)}{h} = \frac{kg(x+h)-kg(x)}{h} = \frac{k[g(x+h)-g(x)]}{h}\]

Using constant multiple rule or limit of a constant times a function which evaluates to the constant times the function, then:

\[k\lim_{h \to 0} \frac{g(x+h)-g(x)}{h} = k.g'(x) = kg'(x)\]

We can therefore note, derivative of a constant times a differentiable function is that constant times derivative of that function.

In general we can express this rule constant times a function as:

\(f'(x) = kg'(x)\) or \(y' = kg'\) or \(\frac{dy}{dx} = k\frac{dg}{dx}\)

That’s if \(y = f(x) = kg(x)\)

A.13.1.6.4 Derivatives of sums and differences

Given \(f(x) = g(x) + m(x)\) where \(g\) and \(m\) are differentiable at \(x\), then we can differentiate \(f\) at \(x\) as follows:

\[\frac{f(x+h) - f(x)}{h} = \frac{[g(x+h) + m(x+h)] - [g(x) + m(x)]}{h}\]

\[= \frac{g(x+h)+m(x+h)-g(x)-m(x)}{h} = \frac{g(x+h)-g(x)}{h}+ \frac{m(x+h)-m(x)}{h}\]

\[\lim_{h \to 0} \frac{f(x+h) - f(x)}{h}= \lim_{h \to 0}[\frac{g(x+h)-g(x)}{h}+\frac{m(x+h)-m(x)}{h}]\]

\[\lim_{h \to 0} \frac{g(x+h)-g(x)}{h} + \lim_{h \to 0}\frac{m(x+h) -m(x)}{h} = g'(x) + m'(x)\]

Basically this means derivative of sum of two differentiable functions is sum of their derivatives. This also holds for difference of two differentiable functions.

We can express this sum and difference rule as:

\[f'(x) = g'(x) \pm m'(x)\]

if \(y = f(x) = g(x) + m(x)\)

A.13.1.7 Derivatives of Products and Quotients

Derivatives of a constant, exponents, sums and differences were rather simple and intuitive. Derivatives of products and quotients are not quite so and for that reason we discuss them separately.

A.13.1.7.1 Derivatives of Products

We want to begin by identifying a pattern which we can use to establish derivatives of product.

Therefore, if we are given two functions, \(F(x) = x\) and \(G(x) = x^2\) and told a third function \(f(x)\) is a product of these two functions, that is \(f(x) = F(x)G(x) = x*x^2\), we want to establish derivative of \(f(x)\).

Let us work it out using our difference quotient and then get it’s limit.

\[\frac{f(x+h)-f(x)}{h}=\frac{[(x+h)*(x+h)^2] - [x*x^2]}{h} = \frac{(x+h)*(x^2+2xh+h^2)-x^3}{h}\]

\[= \frac{x^3+2x^2h+xh^2+x^2h+x^2h+2xh^2+h^3-x^3}{h} = \frac{h(2x^2+xh+x^2+2xh+h^2)}{h}\]

\[2x^2 + xh + x^2 + 2xh + h = 2x^2 + x^2 = 3x^2\]

From this, derivative of \(f(x) = F(x)G(x) = x*x^2\) is equal to \(3x^2\). This derivative is actually a product of our initial function and derivative of the second function plus derivative of our initial function times second function. We can express this product rule as follows:

If \(y = f(x) = F(x)G(x)\) where \(F'(x)\) and \(G'(x)\) exist, then

\(f'(x) = F(x)G'(x) + F'(x)G(x)\)

A.13.1.7.2 Derivatives of quotients

Just like product rule, derivatives of quotients have a method formulated by getting derivative of it’s quotients. This formula simply states that derivative of a quotient of two functions is similar to denominator times derivative of its numerator minus numerator times derivative of denominator, all over denominator squared. This quotient rule can be expressed as:

\[f(x) = \frac{F(x)}{G(x)} \qquad{} f'(x) = \frac{G(x)F'(x) - F(x)G'(x)}{[G(x)]^2}\]

if \(y = f(x) = \frac{F(x)}{G(x)}\)

A.13.1.8 Chain Rule: Exponential Form

In it’s basic form, chain rule enables differentiation of functions expressed as \(f[g(x)]\) if \(f(x)\) and \(g(x)\) are differentiable.

A.13.1.8.0.1 General exponent rule

We now know how to differentiate an exponential function using:

\[f'(x) = nx^{n-1}\]

For function \(y = f(x) = x^n\) with \(n\) being a \(\mathbb{R}\) number.

However if we wanted to get derivative of a Exponential function of a exponential function, something like \(f(x) = [g(x)]^3\) where \(g(x) = x^2\), we need to reformulate another formula.

Like before, to generalize a formula for \(f'(x) = [g(x)]^n\) where \(g(x)\) is any differentiable exponential function, we need to establish a pattern.

Let us consider \(F(x) = [f(x)]^2\), which is similar to \(F(x) = f(x)f(x)\). Using product rule we can differentiate \(F\) as:

\[F'(x) = f(x)f'(x) + f(x)f'(x)\]

\[\therefore F'(x) = 2f(x)f'(x)\]

Now let’s consider \(y = [f(x)]^3\) which is the same as \(y = [f(x)]^2f(x)\). Using product rule we can thus differentiate \(F\) as:

\[\frac{d}{dx}[f(x)]^3 = [f(x)]^2\frac{d}{dx}f(x) + f(x)\frac{d}{dx}[f(x)]^2\]

substituting \(\frac{d}{dx}[f(x)]^2\) with what we got earlier that is \(2f(x)f'(x)\) we get

\[\frac{d}{dx}[f(x)]^3 = [f(x)]^2f'(x) + f(x)[2f(x)f(x)]\]

\[\therefore \frac{d}{dx}[f(x)]^3 = 3[f(x)]^2f'(x)\]

With that we have a pattern which can hold true for any exponent, that is:

\[\frac{d}{dx}[f(x)]^n = n[f(x)]^{n-1}f'(x)\]

Where \(n\) is any positive integer

This what we call general exponent rule part of a differential rule called chain rule. This general rule can also be efficiently written as:

\[y' = fx^{n-1}\]

or

\[\frac{d}{dx}f^n = nf^{n-1}\frac{dx}{dx}\]

Where \(f = f(x)\)

Example

Given:

\[f(x) = \frac{1}{(3x + 2)^2}\]

let us get it’s derivative \(f'(x)\)

To begin with, we know \(f\) can be written as:

\[f(x) = (3x + 2)^{-2}\]

then

\[f'(x) = -2(3x + 2)^{-3}*(3x + 2)'\]

\[(3x + 2)^{'} = 3\]

\[\therefore f'(x) = -2(3x + 2)^{-3} * 3 = -6(3x + 2)^{-3}\]

A.13.2 Graphing and Optimization

In this section we want to get a bit more acquainted with shape of polynomials as core functions in statistics. We are also going to determine minimum and maximum points of these graphs. To do this we will rely on concept of derivative to determine slope of a graph at a particular point.

As we go through this section, it would be good to recall that graph of a polynomial function with a positive degree \(n\) can have at most \((n-1)\) turning points and can cross x-axis at most \(n\) times.

With this in mind, core issues we will discuss on in this section are:

A.13.2.1 Continuity

A continuous graph is one without holes or breaks; it can be drawn from start to end without breakage. This concept of continuity is of importance when plotting and analyzing graphs.

Below are two graphs, \(f\) and \(g\); function \(f\) is continuous over an interval while function \(g\) is discontinuous after a certain point (2).

op <- par("mfrow")
par(mfrow = c(1, 2))

x <- -5:5
plot(x, type = "l", xlab = "x", ylab = "y")
title("f(x) = x")
plot(x, x^2/(2*x), type = "l", col = 4, xlab = "x", ylab = "y")
points(0, 0, pch = 21, col = 4)
segments(-1, -0.5, -0.1, -0.1, col = 4)
segments(0.1, 0.1, 1, 0.5, col = 4)
title(expression(paste("g(x) =" , x^2/(2*x))))


par(mfrow = op)

We can thus define continuity of a function at point \(x = a\) with these conditions:

  1. \(\lim_{x \to a} f(x)\) exists
  2. \(f(a)\) exists
  3. \(\lim_{x \to a} f(x) = f(c)\)

If one or more of this conditions is violated, then function is said to be discontinuous at \(x = a\).

We can also note a function is continuous on an open interval \((a, b)\) if it is continuous at each point on that interval.

Example

Let us consider our two functions \(f(x) = x\) and \(g(x) = x^2/2x\) and see how they adhere to these conditions.

For function \(f\)

We see that this function (\(f(x) = x\)) is continuous on open interval (\(-\infty\), \(\infty\)), therefore given point \(x = 0\):

  1. \(\lim_{x \to 0} f(x)\) exists, it is 0.
  2. \(f(0)\) also exists as \(f(0) = 0\)
  3. \(\lim_{x \to 0} f(0) = f(0) = 0\)

All three conditions are met.

For function \(g\)

We note a discontinuity marked by a hollow point on our graph, so for this point \(x = 0\):

  1. \(\lim_{x \to 0} f(x)\) exists as one-sided limits are similar, they are approaching 4.
  2. \(f(0)\) does not exist as 0 \(\div\) 0 is not defined
  3. \(\lim_{x \to 0} f(0) \ne f(0)\) that is \(\lim_{x \to 0}f(0) = NAN\)

Here only the first condition is satisfied while the other two do not conform.

One-sided continuity

Like limits, we can also talk of one-sided continuity like continuity on the right hand means at \(x = a\), \(\lim_{x \to a^+}f(x) = f(a)\) and continuity on the left hand \(\lim_{x \to a^{-}}f(x) = f(c)\)

Open, closed and half-closed intervals

A graph shaped like a semi-circle can be said to be continuous on a closed interval \([a, b]\) where \(a\) is a point where it begins and \(b\) point it ends. Continuity at closed intervals are continuous over closed interval like \([a, b]\) and is continuous on right side from \(a\) and continuous on left side at \(b\).

A function can have a half-closed interval like \([0, \infty)\).

A.13.2.1.1 Continuity properties

There are some handy properties formulated to determine intervals of continuity for some important classes of functions without looking at their graph as well as having to use conditions of continuity.

A general property states that, for any two continuous functions on similar interval, their sum, difference product and quotient are continuous on the same interval except where \(x = 0\) making denominator to be 0.

Other properties of specific functions are:

  1. Constant functions like \(f(x) = x\) are continuous for all \(x\)
  2. For all positive exponential functions of the form \(f(x) = x^n\) are continuous for all it’s base \(x\).
  3. Polynomial functions are continuous for all \(x\) like \(f(x) = x^4 + x^2 + 5\)
  4. Rational functions are continuous except for those values which make denominator to be 0 like \(x = 2\) in \(f(x) = 2x^2/(x - 2)\)
  5. Any positive odd root greater than 1 are continuous wherever it’s radicand is continuous like \(x^3\) in \(f(x) = \sqrt[5]{x^3}\)
  6. Any positive even root is continuous wherever it’s radicand is continuous and non-negative, that is \(x\) in \(f(x) = x^2\) is in \([0, \infty)\)
A.13.2.1.2 Infinite Limits

From our discussion above, we have taken note of one situation where a limit might not exist, this is when values approaching a certain value from left and right are different. Let us consider another situation where a limit does not exist.

For functions whose values become extremely large as \(x\) approaches \(a\), a limit may not exist. This is because these extreme values from left hand side approaching \(a\) would be different from those approaching \(a\) from right hand side. Therefore their one sided limits would be different. Symbol \(\infty\) and \(-\infty\) are used to describe \(x\) near \(a\) if extreme values are positive or negative.

Example

Given function

\[h(x) = \frac{x}{x-1}\]

for \(x = 1\), \(h(1)\) is an extreme value heading to infinity hence output is often shown as inf meaning infinity. Exploring values approaching 1 from left we see they are negative and increasing to an extreme value while those approaching from right towards 1 are positive and also increasing to an extreme value.

ranges <- c(0.0001, 0.001, 0.01)
x <- c(1 - rev(ranges), 1, 1 + ranges)
y <- x/(x-1)
matrix(c(x, y), nrow = 2, byrow = TRUE, dimnames = list(c("x", "f(x)"), rep("", length(x))))
##                                                               
## x      0.99    0.999     0.9999   1     1.0001    1.001   1.01
## f(x) -99.00 -999.000 -9999.0000 Inf 10001.0000 1001.000 101.00

In both sides we actually do not have a limit given that we are approaching different values, we can therefore say \(\lim_{x \to 1}f(x)\) does not exist. We can also express these one-side limits as:

\[\lim_{x \to 1^-} f(x) = -\infty \qquad{} \text{and} \qquad{} \lim_{x \to 1^+} f(x) = \infty\]

These two statements describe graph of \(f\) at \(x\) approaching 1 and as we can see below, the line tends to approach 1 but does not get there thus we see a vertical curve tending to approach 1 and heading towards infinity (\(\infty\)). This vertical line to which curve is approaching is referred to as vertical asymptote, in this case it is \(x = 1\).

x1 <- seq(0.9, 1.1, length.out = 100)
y1 <- x1/(x1-1)
plot(x1, x1/(x1 - 1), type = "n", xlab = "x", ylab = "f(x)")
title("f(x) = x/(x-1)")
lines(x1[x1 < 1], y1[which(x1 < 1)], col = 4)
lines(x1[x1 > 1], y1[which(x1 > 1)], col = 4)
abline(v = 1)

Let’s also consider a situation where both sides of a function’s values approaching \(a\) are positives. For this, let’s consider function

\[g(x) = \frac{x}{x^2 - 1}\]

at \(x\) approaching 1

y <- function(x) x/(x -1)^4
matrix(c(x, y(x)), nrow = 2, byrow = TRUE, dimnames = list(c("x", "f(x)"), 1:length(x)))
##            1        2         3   4          5         6        7
## x    9.9e-01 9.99e-01 9.999e-01   1 1.0001e+00 1.001e+00 1.01e+00
## f(x) 9.9e+07 9.99e+11 9.999e+15 Inf 1.0001e+16 1.001e+12 1.01e+08

Graph of \(g\) should look like this:

x2 <- seq(-0.5, 2.5, 0.001) 
plot(c(-0.6, 2.6), c(0, 20), type = "n", xlab = "x", ylab = "g(x)")
title(expression(paste("g(x) =", x/(x - 1)^4)))
lines(x2, y(x2), col = 4)

Notice both graphs are approaching \(x = 1\) but not really reaching it. This makes \(x = 1\) it’s vertical asymptote.

Vertical Asymptote

Generally, we can define vertical asymptote as a line \(x = a\) for any graph if limit of function \(f\) does not exist as \(x\) approaches \(a\) from left and right because values of \(f(x)\) are increasing either negatively or positively to infinity.

  • For large positive or negative values approaching \(a\) from left, we express it as:

\[\lim_{x \to a^-} f(x) = \infty \qquad{} (or -\infty) \]

  • For large positive or negative values approaching \(a\) from right we express it as:

\[\lim_{x \to a^+} f(x) = \infty \qquad{} (or - \infty)\]

If both sides are either positive or negative, then we can express them as:

\[\lim_{x \to a} f(x) = \infty \qquad{} (or -\infty)\]

A.13.2.1.3 Limits and Vertical asymptote at points of discontinuity

A function can have more than one discontinuity and for each of these discontinuity there could or could not be a vertical asymptote. Here we want to see situations where a discontinuity has a vertical asymptote and where it does not have one.

Example

Let us consider this function:

\[f(x) = \frac{x - 3}{x^2 - 4x + 3}\]

From our discussion on properties of continuity, and particularly our fourth property, we know discontinuity for a rational function such as \(f\), discontinues when denominator becomes 0. For this function \(x = 1\) and \(x = 3\) will make it’s denominator to be 0. Now let us look at values approaching 1 and 3 to determine shape of graph at these discontinuities.
We begin with \(x = 1\)

x <- c(1 - rev(ranges), 1, 1 - ranges)
y <- function(x) (x-3)/(x^2 - 4*x + 3)
matrix(c(x, y(x)), nrow = 2, byrow = TRUE, dimnames = list(c("x", "f(x)"), rep("", length(x))))
##                                                                    
## x       0.99     0.999  9.999e-01    1  9.999e-01     0.999    0.99
## f(x) -100.00 -1000.000 -1.000e+04 -Inf -1.000e+04 -1000.000 -100.00

We can see values of \(f(x)\) approaching 1 are large negative values which we can express as:

\[\lim_{x \to _1} \frac{x - 3}{x^2 - 4x + 3} = -\infty\]

This means \(x = 1\) is a vertical asymptote for graph of \(y = f(x)\)

For \(x = 3\)

x <- c(3 - rev(ranges), 3, 3 + ranges)
matrix(c(x, y(x)), nrow = 2, byrow = TRUE, dimnames = list(c("x", "f(x)"), rep("", length(x))))
##                                                                   
## x    2.9900000 2.9990000 2.999900   3 3.000100 3.0010000 3.0100000
## f(x) 0.5025126 0.5002501 0.500025 NaN 0.499975 0.4997501 0.4975124

There is no value at \(x = 3\) but left and right side of three are approaching 0.5, hence \(\lim_{x \to 3}\) exists and therefore there is no vertical asymptote at \(x = 3\).

We can represent this graphically as shown below:

x <- seq(-5, 5, 0.001)
plot(c(-6, 6), c(-5, 5), type = "n", xlab = "x", ylab = "y")
abline(v = 1, lty = "dashed")
lines(x, y(x), col = 4)
title(expression(paste("f(x) = ", (x-3)/(x^2 - 4*x + 3))))

A.13.2.1.4 Solving inequalities using continuity properties

Inequalities are functions whose left and right expressions are not equal to each other, they are either \(<\), \(>\), \(\geqslant\), \(\leqslant\) or \(\neq\): \(<\) and \(>\) are specifically called strict inequalities.

For such functions, we can solve for \(x\) in \(f(x)\) that satisfies given inequality. For example, given:

\[\frac{x^2 - 1}{x - 3} > 0\]

we can determine which \(x\) values equates left hand expression to positive values.

To do this we use a special line graph called a sign chart. Sign charts are basically charts which partition \(x\) values into intervals indicating expected sign of \(f(x)\) values. To construct these sign charts we will rely on two concepts; one is our continuity properties and two is a sign properties.

We want to use continuity properties to determine points of continuity in a function which will form intervals and then determine sign of \(f(x)\) in those intervals. Sign of these intervals is determined by the sign properties on a interval (a, b). This properties states that, “if f is continuous on \((a, b)\), and \(f(x) \ne 0\) for all \(x\) in \((a, b)\), then either \(f(x) > 0\) (positive) for all x in \((a, b)\) or \(f(x) < 0\) (negative) for all \(x\) in \((a, b)\)”.

Let’s look at an example to understand this sign properties.

Example

Given an interval \((0, -10)\) for a continuous function \(h\) and \(h(x) \ne 0\) for any \(x\) value in that interval, if \(f(-5) = -5\), we want to determine if it is possible to get a positive \(h(x)\) for any \(x\) value in given interval.

Let us try and connect (-5, -5) with a positive value like (5, 5).

plot(c(-6, -3), c(-6, 6), type = "n", xlab = "x", ylab = "f(x)")
abline(h = 0, lty = 2)
text(-3.3, 0.35, labels = "h(x) = 0")
lines(c(-5, -5), c(-5, 5), col = 4)
points(c(-5, -5), c(-5, 5), pch = 21, bg = 4)

We can see to connect these two points with a continuous line we had to pass through \(h(x) = 0\) (crossing x-axis) thus violating given condition of \(h(x) \ne 0\).

We therefore have to have a negative \(f(x)\) for this function to be continuous and still meet set condition. This reasoning still holds if we changed our sign, this is what sign properties is about.

With that, we can note, given one \(h(x)\) we can determine sign of all other values in that interval.

We can now use this sign properties to solve our earlier inequality. We start by making our left side a function \(h\).

\[h(x) = \frac{x^2 - 1}{x - 1}\]

From this function, we want to identify intervals of continuity by noting points of discontinuity. This function has one discontinuous point, this is a point that equate this rational function’s denominator to 0. Also recall we are not including points where \(h(x) = 0\), therefore we need to take note of this point.

For this function \(h(x) = 0\) when \(x = -1\) and when \(x = 1\). Altogether we have three values for which we will use to determine continuous intervals of \(h\). We refer to these values (x = 3, 1, and -1) as partition numbers which we can describe as values which determine open intervals where \(h(x)\) does not change sign.

We can visualize these partitions on a real number line as shown below:

plot(1:10, type = "n", axes = FALSE, ann = FALSE)
abline(h = 5)
text(2:9, rep(5, length(2:9)), labels = "|")
points(c(4, 6, 8), rep(5, 3), pch = 21, bg = c("blue", "blue", "white"), cex = 1.2)
text(2:9, rep(4.4, length(2:9)), labels = -3:4)
text(9.8, 5.35, labels = "x-axis")

These partition numbers give us four open intervals, these are \((-\infty, -1)\), (\(-1, 1\)), (\(1, 3\)), and (\(3, \infty\)). At these intervals we know \(h\) is continuous and therefore sign is similar for all \(h(x)\). What we now need is to find at least one \(f(x)\) in these intervals and determine their sign. For our example we can pick x = -2, 0, 2, and 4, which become what we refer to as evaluation numbers. \(h(x)\) at these point are -3/5, 1/3, -3, and 15, therefore sign for these partitions are -, +, - and +.

Now we can construct a sign chart which is basically a real number line showing each partition, their sign and their evaluation numbers.

plot(1:10, type = "n", axes = FALSE, ann = FALSE)
abline(h = 5)
text(2:9, rep(5, length(2:9)), labels = "|")
points(c(4, 6, 8), rep(5, 3), pch = 21, bg = c("blue", "blue", "white"), cex = 1.2)
text(2:9, rep(4.4, length(2:9)), labels = -3:4)
text(c(2.5, 3, 5, 7, 9), rep(6, 5), labels = c("f(x)", rep(c("- - -", "+ + +"), 2)))
text(c(3, 5, 7, 9), rep(7, 4), labels = c(expression(paste("(-", infinity, ", -1)")), "(-1, 1)", "(1, 3)", expression(paste("(3, ", infinity, ")"))))
segments(x0 = c(3, 5, 7, 9), y0 = rep(4, 4), x1 = c(3, 5, 7, 9), y1 = rep(3, 4))
segments(3, 3, 9, 3)
text(6, 2.5, labels = "Evaluation numbers")
text(10, 4.7, labels = "x-axis")
segments(x0 = c(4, 6, 8)+.1, y0 = rep(5, 3), x1 = c(4, 6, 8), y1 = rep(8, 3), lty = "dashed")

We can now solve our inequality:

\[\frac{x^2 - 1}{x - 3} > 0\]

as \(-1 < x < 1\) or \(x > 3\)

A.13.2.2 First derivative

In this section we want to use derivatives which are basically slopes of a graph at a particular point, to inform us about shape of a given function in addition to locating minimum and maximum points.

We shall go over four sub-topics, these are:

A.13.2.2.1 Increasing and Decreasing functions

One important aspect of graphing a function is to determine points of increase and points of decrease. We can comfortably do this using a sign chart of it’s derivative function, that is, make a sign chart of slopes of tangent lines along a graph. By doing so we are able to tell where graph is rising and where it is falling based on their sign. We can also be able to tell it’s vertices (minimum and maximum point).

Example

Let us look at this function

\[h(x) = -x^5 - x^4 + 14x^3 + 6x^2 - 45x -3\]

it’s derivative is:

\[h'(x) = -5x^4 - 4x^3 +42x^2 + 12x - 45\]

This should look familiar as we came across it when discussing factoring polynomials with rational zero’s theorem. If you recall, we were able to locate all points where \(h(x) = 0\) these were \(x = -3.00,\text{ }-1.23,\text{ }1.00\text{ and } 2.43\). Since in this case these are points on our derivative function, then they represent points where slope is equal to zero. From our previous discussion we know slope = 0 is a point on a graph where it is neither increasing nor decreasing. We referred to these points as vertices of a graph. Do recal polynomials are continous graphs so we will not have any discontinous points.

Here is a graph of our function \(h\) and a sign chart of it derivative \(h'\).

op <- par("mar")
par(mar = c(6.5, 2.1, 0.5, 1.1))
x <- sort(unique(c(seq(-4, 4, by = 0.01), hx_at_zero)))
hx <- expression((-x)^5 - x^4 + 14*x^3 + 6*x^2 - 45*x - 3)
plot(rep(c(-5,5), 2), c(rep(40, 2), rep(-40, 2)), type = "n", xlab = "", ylab = "", xaxt = "n")
abline(h = 0, v = 0, lty = "dashed")
legend("topright", legend = hx, bty = "n", box.col = "white")
lines(spline(x, eval(hx)), col = 4)
x <- hx_at_zero
points(hx_at_zero, eval(hx), pch = 21, bg = 4)
text(c(-4, -2, 0, 2, 4), rep(-46, 4), labels = c(-4, -2, 0, 2, 4), xpd = TRUE, cex = 0.9)

# Sign chart
rect(-4.3, -75, -3, -53, col = "lightblue", xpd = TRUE, border = NA)
rect(-3, -75, -1.23303, -53, col = "chocolate2", xpd = TRUE, border = NA)
rect(-1.23303, -75, 1, -53, col = "lightblue", xpd = TRUE, border = NA)
rect(1, -75, 2.43303, -53, col = "chocolate2", xpd = TRUE, border = NA)
rect(2.43303, -75, 4.1, -53, col = "lightblue", xpd = TRUE, border = NA)
segments(x0 = -4, y0 = -65, x1 = 4, y1 = -65, xpd = TRUE)
text(x = hx_at_zero, y = rep(-59, 4), labels = rep(0, length(hx_at_zero)), xpd = TRUE, cex = 0.7)
text(x = hx_at_zero, y = rep(-70, 4), labels = round(hx_at_zero, 1), xpd = TRUE)
text(-4.7, -70, labels = "h(x)", font = 2, xpd = TRUE, cex = 0.7)
points(hx_at_zero, y = rep(-65, 4), pch = 19, xpd = TRUE)
t <- c(-4, -2, 0, 2, 3)
t_num <- -5*t^4 - 4*t^3 + 42*t^2 + 12*t - 45
signs <- ifelse(t_num < 0, "----", "++++")
text(x = c(-3.7, -2, 0, 1.7, 3.4), rep(-59, length(signs)), labels = signs, xpd = TRUE)
text(-4.7, -59, labels = "h'(x)", cex = 0.7, font = 2, xpd = TRUE)
intervals <- c(expression(paste("(-", infinity, ", -3)")),"(-3, -1.23)","(-1.23, 1)","(1, 2.43)",expression(paste("(2.43, ", infinity, ")")))
text(x = c(-3.65, -2, 0, 1.7, 3.4), rep(-55, length(signs)), labels = intervals, cex = 0.8, xpd = TRUE)

par(mar = op)

It’s clear to see when a graph is decreasing, it’s derivative has negative values and when it is increasing it’s derivative has positive values. For points where derivative is 0, graph reaches it’s peak (minimum or maximum). Graph starts from \(-\infty\) and ends at positive \(\infty\), this is because \(x\) can assume any number on a real number line and greater \(x\) values on both sides increase or decrease very fast towards infinity.

A.13.2.2.2 Partition Numbers and Critical values

Domain values of a function \(f\) are said to be critical values if they are partition numbers of it’s derivative \(f'(x)\). Partition numbers of \(f'\) are \(x\) values where \(f'(x) = 0\) and where \(f'(x)\) does not exist (points of discontinuous). However, it should be noted that not all partition numbers of \(f'\) are critical values of \(f\) and therefore they would not be in domain of \(f\). On that note, critical values of a function \(f\) are always part of \(f's\) domain.

Knowing this two terms and their distinction can guide us in knowing shape of a graph especially at certain points, for example a graph can be generally increasing but have one particular point where it has a slope of 0, or a point where rate of increase or decrease is different. This can easily be shown by a sign chart where all evaluation numbers have similar sign but have partitions of either 0 or not-defined.

We can best grasp this concept with examples.

f’(x) = 0 as a critical values of f

A cube function is a good example of a function whose derivative has one partition number that is also a critical value.

For example, given:

\[f(x) = -x^3\]

we establish it’s as derivative

\[f'(x) = -3x^2\]

Since this derivative is a square function then we know it is continuous for all values of \(x\), which means we do not expect a discontinuity point as one of its partition numbers. We also know there is only one value which can equate this function to zero and that is when \(x = 0\). Therefore we can note we have one partition value for \(f'\) and since 0 is also a domain of \(f\) (\(f(0) = 0\), there is no discontinuity) then we know it is also a critical value of \(f\).

Let’s make a sign chart which will tell us more about shape of \(f\) even before plotting it.

plot(0:4, 0:4, type = "n", axes = FALSE, ann = FALSE)
rect(0, 0.8, 2, 3, col = "lightblue", border = NA)
rect(2, 0.8, 4.2, 3, col = "lightblue", border = NA)
text(c(1, 3), rep(2.8, 2), labels = c(expression(paste("(-", infinity, ", 0)")), expression(paste("(0, ", infinity, ")"))))
text(c(0.6, 1:3), rep(2.5, 4), labels = c("f'(x)", "----", 0, "----"), font = c(1, 2, 1, 2))
segments(0, 2, 4, 2)
text(4, 2.2, "x", font = 2)
text(c(1, 2, 3), rep(2, 3), labels = "|")
text(c(0.6, 1:3), rep(1.6, 4), labels = c("f(x)", -1:1))
text(c(1, 3), rep(1,2), labels = c("Decreasing", "Decreasing"))

From our sign chart we can see our graph is decreasing then “flattens out” at point \(f(x) = 0\) and then resumes it’s descent.

Here is a plot of \(f\).

x <- seq(-4, 4, 0.0001)
y <- function(x) (-x)^3
plot(c(-5.5, 5.5), c(-64, 64), type = "n", xlab = "x", ylab = "f(x)")
abline(h = 0, v = 0, lty = "dashed")
lines(x, y(x), col = 4)

In this example we have seen partition value of \(f'\) is also a critical value of \(f\). We have also seen at \(f'(x) = 0\) \(f\) is neither increasing nor decreasing hence we have a horizontal tangent line at \(x = 0\).

Non-existant f’(x) as a critical value of f

A function’s derivative can be discontinuous at a certain point but that same point would be continuous on it’s original function. A good example of this are cube root functions.

For example if we have the following cube root function

\[h(x) = \sqrt[3]{x+2} + 3\]

which we can express as an exponential function

\[h(x) = (x + 2)^{\frac{1}{3}} + 3\]

Then we can write it’s derivative as

\[h'(x) = \frac{1}{3}(x + 2)^{-\frac{2}{3}}\]

or simply

\[h'(x) = \frac{1}{3(x+2)^{\frac{2}{3}}}\]

Since this is a quotient, we expect discontinuity when denominator evaluates to zero, this should be when \(x = -2\). We also know there is no point where \(f'(x) = 0\) because of it’s numerator 1, that is, no value divided by 1 can become zero. Therefore derivative \(h'\) has one partition number which is -2. -2 is not in domain of \(h'\) (\(h'(-2) = \infty\)) but it is in the domain of \(h\), that is \(h(-2) = \frac{1}{3(-2+2)^{\frac{2}{3}}} = 3\) which means -2 is a critical value of \(h\).

Let us look at sign chart of \(h'\) to predict shape of \(h\)’s graph.

plot(0:4, 0:4, type = "n", axes = FALSE, ann = FALSE)
rect(0, 0.8, 2, 3, col = "chocolate2", border = NA)
rect(2, 0.8, 4.2, 3, col = "chocolate2", border = NA)
text(c(1, 3), rep(2.8, 2), labels = c(expression(paste("(-", infinity, ", -2)")), expression(paste("(-2, ", infinity, ")"))))
t_num <- c(-3, -1)
signs <- ifelse(1/(3*(abs(t_num + 2)^(1/3))^2) < 0, "----", "++++")
text(c(0.6, 1:3), rep(2.5, 4), labels = c("h'(x)", signs[1], "ND", signs[2]), font = c(1, 2, 1, 2))
segments(0, 2, 4, 2)
text(4, 2.2, "x", font = 2)
text(c(1, 2, 3), rep(2, 3), labels = "|")
text(c(0.6, 1:3), rep(1.6, 4), labels = c("h(x)", -3:-1))
text(c(1, 3), rep(1,2), labels = c("Increasing", "Increasing"))

From this sign chart we can see derivative of \(h\) is positive then at \(h'(-2)\) there is a discontinuity (not defined), this is followed by positive signs. Therefore on graph of \(h\) we expect it to be increasing until it reaches \(x = -2\) where it stops increasing before it resumes it’s rise.

x <- seq(-5, 5, 0.001)
y <- c(-(abs(x[x <= -2] + 2))^(1/3) + 3, 
       (x[x > -2] + 2)^(1/3) + 3)
plot(x, y, type = "n", ylab = "h(x)")
abline(v = 0, h = 0, lty = "dashed")
lines(x, y, col = 4)
segments(-2, 2.3, -2, 3.7)
points(-2, 3, pch = 21, bg = 4)
text(x = -3.8, 4.65, labels = expression(paste("h(x) =", sqrt((x+2), 3) + 3)))

In this example we have noted that a continuous function can be increasing or decreasing even for points where it’s derivative is non existent. That is, a given point \(x\) of a derivative can be non-existent but same point would be in domain of it’s derivative function.

Function without critical values

A discontinuous point on a derivative is considered a partition number. In our preceding example we saw how this point can be a critical value, here we look at an example where it is not a critical value (not a domain of it’s function).

A good example is a quotient of first degree polynomial with a range that does not include 0 and is discontinuous at a point where denominator equals zero.

Therefore given:

\[j(x) = \frac{6 + x}{x + 3}\]

we can get it’s derivative as

\[j'(x) = \frac{-3}{x^2+6x+9}\]

For \(j'\), there is a discontinuity at \(x = -3\) which means -3 is our only partition number. -3 is not in domain of \(f\) since \(j(-3)\) is a very large number approaching infinity. Therefore, we expect \(f\) not to be a continuous graph with discontinuity happening at -3.

We can anticipate shape of \(f\)’s graph from this sign chart.

plot(0:4, 0:4, type = "n", axes = FALSE, ann = FALSE)
j_prime <- expression((-3)/(t_num^2 + 6*t_num + 9))
rect(0, 0.8, 2, 3, col = ifelse(eval(j_prime) >= 0, "chocolate2", "lightblue"), border = NA)
rect(2, 0.8, 4.2, 3, col = ifelse(eval(j_prime) < 0, "lightblue", "chocolate2"), border = NA)
text(c(1, 3), rep(2.8, 2), labels = c(expression(paste("(-", infinity, ", -3)")), expression(paste("(-3, ", infinity, ")"))))
t_num <- c(-4, -2)
signs <- ifelse(eval(j_prime) < 0, "----", "++++")
text(c(0.6, 1:3), rep(2.5, 4), labels = c("j'(x)", signs[1], "ND", signs[2]), font = c(1, 2, 1, 2))
segments(0, 2, 4, 2)
text(4, 2.2, "x", font = 2)
text(c(1, 3), rep(2, 1), labels = "|")
points(2, 2, pch = 21, bg = "white")
text(c(0.6, 1:3), rep(1.6, 4), labels = c("j(x)", -4:-2))
text(c(1, 3), rep(1,2), labels = c("Decreasing", "Decreasing"))

From our sign chart we see derivative of \(j\) is negative for interval \((-\infty, -3)\) as well as \((-3, \infty)\).

Therefore we expect graph of \(j\) to generally be decreasing.

Looking at this chart and computing values of \(f(-4)\) and \(f(-2)\), it is not unimaginable to include a conclusion of some increase given that \(f(-4) = -3\) and \(f(-2) = 0.6\), but this would be incorrect as a sign chart’s is meant to show general pattern of intervals of a graph rather than difference in points between intervals.

We can now look at graph of \(f\) to see what our sign chart revealed.

x <- seq(-4, -1, 0.01)
y = (6 + x)/(3 + x)
plot(x, y, type = "n", ylab = "j(x)")
abline(v = -3, h = 0, lty = "dashed")
lines(x, y, col = 4)
text(-1.4, 299, labels = expression(paste("j(x) =", (6+x)/(x+3))))

Notice how graph discontinues just before -3 as it heads downward towards negative infinity and picking up somewhere above and begins a decline heading towards positive infinity.

In this example we see a discontinuous point on a derivative function which is also discontinuous on it’s function thereby not considered a critical value of that function.

It is good to note that intervals where a graph is increasing or decreasing should always be expressed with open intervals which are subset of it’s functions domain.

A.13.2.2.3 Local Extrema

A point on a graph can be called a local extremum if it is either a local minimum or local maximum (local extrema if both). A local minimum is a vertex on a graph of a continuous function where it changes from a declining state to an increasing state. Basically we are looking at a point following an interval of decrease and we refer to it as “local” because interval being considered is nearest to that vertex.

A local maximum is also a vertex on a graph of a continuous function where it changes from an increasing state to a declining state.

Given a function, we can locate it’s local extrema by looking at it’s critical values which as noted earlier are points where our derivative equals zero or are not defined.

Generally, we can say existence of a local extrema occurs if function \(j\) is continuous on an interval \((a, b)\) and “\(c\)” a number within this interval has \(j(c)\) as a local extremum if \(j'(c)\) is either equal to zero or is undefined.

Do note, \(j'(c) = 0\) does not necessarily mean it is a local extrema, we need to look at each critical value to determine if it is a local minimum, local maximum or neither.

A.13.2.2.4 First-Derivative Evaluation

This evaluation uses sign of nearby values on left and right side of a critical value for an existing derivative to establish if it is a local extrema. If sign changes from negative to positive, then it is a “local minimum”, if it changes from positive to negative, then it is a local maximum. If sign does not change and they are both negative or positive, then it is not a local extremum.

op <- par(c("mfrow", "mar", "mai"))
par(mfrow = c(2, 2), mar = c(2.1, 2.1, 2.1, 2.1), mai = rep(0.2, 4))

# Local minimum
plot(c(0, 4), c(0, 3), type = "n", axes = FALSE, ann = FALSE)
title("j(c) is a local minimum")
rect(0, 0.3, 2, 2.5, col = "lightblue", border = NA)
rect(2, 0.3, 4, 2.5, col = "chocolate2", border = NA)
text(c(0.3, 1, 3), rep(2, 3), labels = c("j'(x)", "----", "++++"))
segments(0, 1.5, 3.85, 1.5)
text(3.93, 1.5, "x", font = 2)
text(c(0.1, 3.8), rep(1.5, 2), labels = c("(", ")"))
text(c(0.1, 2, 3.8), rep(1.1, 3), labels = c("a", "c", "b"))
text(c(0.3, 1.3, 3), rep(0.5, 3), labels = c("j(x)", "Decreasing", "Increasing"))

# Local maximum
plot(c(0, 4), c(0, 3), type = "n", axes = FALSE, ann = FALSE)
title("j(c) is a local maximum")
rect(0, 0.3, 2, 2.5, col = "chocolate2", border = NA)
rect(2, 0.3, 4, 2.5, col = "lightblue", border = NA)
text(c(0.3, 1, 3), rep(2, 3), labels = c("j'(x)", "++++", "----"))
segments(0, 1.5, 3.85, 1.5)
text(3.93, 1.5, "x", font = 2)
text(c(0.1, 3.8), rep(1.5, 2), labels = c("(", ")"))
text(c(0.1, 2, 3.8), rep(1.1, 2), labels = c("a", "c", "b"))
text(c(0.3, 1.3, 3), rep(0.5, 3), labels = c("j(x)", "Increasing", "Decreasing"))

# Neither (all negative)
plot(c(0, 4), c(0, 3), type = "n", axes = FALSE, ann = FALSE)
title("j(c) is not a local extremum")
rect(0, 0.3, 2, 2.5, col = "lightblue", border = NA)
rect(2, 0.3, 4, 2.5, col = "lightblue", border = NA)
text(c(0.3, 1, 3), rep(2, 3), labels = c("j'(x)", "----", "----"))
segments(0, 1.5, 3.85, 1.5)
text(3.93, 1.5, "x", font = 2)
text(c(0.1, 3.8), rep(1.5, 2), labels = c("(", ")"))
text(c(0.1, 2, 3.8), rep(1.1, 2), labels = c("a", "c", "b"))
text(c(0.3, 1.3, 3), rep(0.5, 3), labels = c("j(x)", "Decreasing", "Decreasing"))

# Neither (all positive)
plot(c(0, 4), c(0, 3), type = "n", axes = FALSE, ann = FALSE)
title("j(c) is not a local extremum")
rect(0, 0.3, 2, 2.5, col = "chocolate2", border = NA)
rect(2, 0.3, 4, 2.5, col = "chocolate2", border = NA)
text(c(0.3, 1, 3), rep(2, 3), labels = c("j'(x)", "++++", "++++"))
segments(0, 1.5, 3.85, 1.5)
text(3.93, 1.5, "x", font =2)
text(c(0.1, 3.8), rep(1.5, 2), labels = c("(", ")"))
text(c(0.1, 2, 3.8), rep(1.1, 2), labels = c("a", "c", "b"))
text(c(0.3, 1.3, 3), rep(0.5, 3), labels = c("j(x)", "Increasing", "Increasing"))


par(mfrow = op$mfrow, mar = op$mar, mai = op$mai)

We can show this graphically, here we see an example of local extrema for a continuous function which occurs at points where derivative equals zero.

extr1 <- (2 - sqrt((-2)^2 - 4*3*(-14)))/6
extr2 <- (2 + sqrt((-2)^2 - 4*3*(-14)))/6
x <- sort(c(seq(-4, 4, 0.1), extr1, extr2))
ind <- c(which(x == extr1), which(x == extr2))
jx <- expression(x^3 - x^2 - 14*x + 11)
plot(x, eval(jx), type = "n", ylab = "j(x)")
abline(v = 0, h = 0, lty = "dashed")
lines(x, eval(jx), col = 4)
points(c(extr1, extr2), eval(jx)[ind], pch = 21, bg = c("chocolate2", "lightblue"))
title("Local Extrema")
text(c(extr1, extr2, 3.1), c(22, -10, 26), labels = c("Local Maximum", "Local Minimum", jx), cex = c(0.7, 0.7, 0.8))

Below example shows two continuous functions whose local extrema occur at a point where it’s derivative is discontinuous.

# Variables
x <- seq(-1, 1, 0.1)
n <- length(x)
hx <- expression(abs(x)^(1/3) + 0.5)

# Coordinate plane
plot(c(-4, 4), y = c(-3.5, 3.5), type = "n", xlab = "x", ylab = "")
abline(h = 0, lty = "dashed")
title("Local extrema with undefined derivatives")

# j(x) = |x|^(1/3)
#-------------------
lines(x, eval(hx), col = 4)
points(0, 0.5, pch = 21, bg = 4)
text(3.2, 3.2, labels = expression(paste("j(x) =", sqrt(abs(x), 3), "+ 0.5")), cex = 0.9)
text(0, 1.7, labels = "Local minimum", cex = 0.7)
# Sign chart for j'(x) = 1/(3*(|x|)^(2/3))
j_prime <- expression(1/(3*abs(x)^(2/3)))
j_prime <- ifelse(x < 0, -eval(j_prime), eval(j_prime))
signs <- ifelse(c(j_prime[which(x == -1)], j_prime[which(x == 1)]) < 0, "----", "++++")
rect(-4.3, 3, -3.3, 3.75, col = "lightblue", border = NA)
rect(-3.3, 3, -2.3, 3.75, col = "chocolate2", border = NA)
text(c(-4, -3.3, -2.7, -2), rep(3.35, 4), labels = c(signs[1], "ND", signs[2], "j'(x)"), cex = c(0.8, 0.7, 0.8, 0.8))

# h(x) = -|x|^(1/3)
#--------------------
lines(x, -eval(hx), col = 4)
points(0, -0.5, pch = 21, bg = 4)
text(3.2, -3.2, labels = expression(paste("h(x) =", sqrt(-abs(x), 3), "+ 0.5")), cex = 0.9)
text(0, -1.7, labels = "Local maximum", cex = 0.7)
# Sign chart for h'(x) = -1/(3*(|x|)^(2/3))
h_prime2 <- expression(-(1/(3*abs(x)^(2/3))))
h_prime2 <- ifelse(x < 0, -eval(h_prime2), eval(h_prime2)) 
signs2 <- ifelse(c(h_prime2[which(x == -1)], h_prime2[which(x == 1)]) < 0, "----", "++++")
rect(-4.3, -3.8, -3.3, -3, col = "chocolate2", border = NA)
rect(-3.3, -3.8, -2.3, -3, col = "lightblue", border = NA)
text(c(-3.9, -3.3, -2.7, -2), rep(-3.4, 4), labels = c(signs2[1], "ND", signs2[2], "h'(x)"), cex = c(0.8, 0.7, 0.8, 0.8))

segments(rep(0, 2), c(-3.5, 2), rep(0, 2), c(-2, 3.5), lty = "dashed")

Below are examples of continuous functions which do not have a local extremum even though one has a point \(h(x) = 0\) and the other has a discontinuous point on it’s derivative.

op <- par(c("mfrow", "mar"))
par(mfrow = c(1, 2), mar = rep(2.1, 4))

# h'(x) = 0 but not a local extremum
x <- seq(-5, 5, 0.01)
plot(c(-5, 5), c(-127, 150), type = "n", ann = FALSE)
title(ylab = "h(x)", line = 2.2)
lines(x, x^3 - 2, col = 4)
points(0, 0, pch = 21, bg = 4)
text(-3, 148, labels = expression(paste("h(x) = ", x^3)))
mtext(text = "Not Local Extrema", side = 3, line = 1.2, font = 2, at = c(8, 170), xpd = TRUE)
mtext("h'(c) = 0", font = 2)
mtext(text = "x", side = 1, line = 1, at = c(7, -160), xpd = TRUE, font = 2)
rect(0, par("usr")[3], par("usr")[2], -110, border = NA, col = "chocolate2")
text(c(-0.8, 1.35, 2.7, 4.05), rep(-124.04, 4), labels = c("h'(x)", "++++", 0, "++++"))

# j'(x) = c not defined and not a local extremum 
y <- ifelse(x < 0, -abs(x)^(1/3), x^(1/3))
plot(c(-5, 5), c(-2, 3), type = "n", ann = FALSE)
lines(x, y, col = 4)
points(0, 0, pch = 21, bg = 4)
text(-2.9, 2.9, labels = expression(paste("j(x) = ", sqrt(x, 3))))
mtext("j'(c) not defined", font = 2)
rect(-0.7, par("usr")[3], par("usr")[2], -1.7, border = NA, col = "chocolate2")
text(c(-1.4, 0.5, 2.2, 4.05), rep(-1.95, 4), labels = c("j'(x)", "++++", "ND", "++++"))


par(mfrow = op$mfrow, mar = op$mar)
A.13.2.2.5 Analysing Graphs

We now know how to use derivatives to plot graphs and as mentioned, this would become a handy skill in model-formation during our statistical analysis.

In addition to plotting graphs, derivatives can also give us useful information about a model even without the model. For example, suppose we are told concentration of a particular drug on a “patient’s” blood stream can be modeled. We however are not given this model but we know its rate of change is given by function below.

\[C'(t) = 3t^2 - 12\]

where:

  • \(C\) = concentration
  • \(t\) = time in hours

Which has this graph:

C_prime <- expression(3*(t - 8)^2 - 12)
t <- seq(0, 16, 0.01)
plot(c(0, 16), c(-20, 200), type = "n", xlab = "t (hrs)", ylab = "Concentration")
abline(h = 0, lty = "dashed")
lines(t, eval(C_prime), col = 4)
points(c(6, 10), rep(0, 2), pch = 21, bg = 4)
text(2, 196, labels = expression(paste("h'(t) = ", 3*(t - 8)^2 - 12)))
title("Rate of change of drug concentration")

Given this information, what can we say about model of this drug’s blood concentration and can we plot this model.

One thing to note here is that a graph of a derivative is like a sign chart, this is because it tells us intervals where a graph is increasing and where it is decreasing. For instance, for this derivative, we can see it is positive on interval (0, 6), 0 at 6, then negative at interval (6, 10), at 10 it is zero and finally positive at interval (10, 16).

Based on this, we can say drug concentration is increasing until 6 hours after being administered when it reaches a local maximum. It then starts from from 6 hours until tenth hour when it reaches local minimum. Finally it increases again up to sixteenth hour. This pattern suggests to us a third-degree polynomial as it has two extrema.

We can therefore use what we now know (local extrema, starting and ending time) to sketch a graph of what we expect of this model but we definitely will not be fully accurate.

con <- expression((t - 8)^3 - 12*t)
t <- seq(4, 12, 0.01)
plot(x = c(3, 15), y = c(-115, -80), type = "n", ann = FALSE, yaxt = "n")
title("Model of drug concentration in blood stream")
title(xlab = "t (hours)", ylab = "C(x)", line = 2)
lines(t, eval(con), col = 4)
extrema <- c((6 - 8)^3 - 12*6, (10 - 8)^3 - 12*10)
segments(x0 = c(6, 10), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(6, 10), y1 = extrema, lty = "dashed")
segments(x0 = c(5, 9), y0 = extrema, x1 = c(7, 11), y1 = extrema)
points(c(6, 10), extrema, pch = 21, bg = 4)

A.13.2.3 Second derivative

From our discussion on derivative and first derivative to be exact, we can now determine intervals where a graph is increasing and where it is decreasing. Now we want to know shape of a graph by looking at slope of it’s derivative (first).

We established this by determining rate of change of an interval, that is, for each interval of a derivative, we want to know if it is increasing or decreasing. Intervals where a derivative is increasing tend to have a certain shape which is different from intervals where it is decreasing.

As we did with derivative, we use a sign chart to indicate to us intervals where a derivative is increasing and where it is decreasing. We do this by taking a derivative of a function’s derivative, this is what we call a second derivative.

There are three basic concepts to grasp as far as second derivative is concerned, these are concave upward, concave downwards and point of inflection.

Concave upwards is an interval of a derivative where it is increasing either positively or negatively while concave downward is a interval where a derivative is decreasing either positively or negatively. Both concave upward and concave downward are referred to as concavity.

Point where concavity changes from upward to downward or downward to upward is called point of inflection. This is a point where a change of sign occurs on a second derivative and from our preceding discussion we know partition numbers indicate points where change of sign happens. Partition numbers in this case being points where a second derivative equals zero or is non-existent.

We should therefore note that point of inflection only occur at partition numbers of a second derivative, however, not all partition numbers of a second derivative are points of inflection. This is because not all partition numbers will have a change of sign on left or right side of it. We should also note that a partition number of a second derivative must be in domain of it function.

Let’s look at an example to put these concepts into focus.

Example

Given \(j(x) = x^3\), it’s derivative \(j'(x) = 3x^2\) informs us graph of \(j\) is increasing on interval (\(-\infty, 0\)), zero at point 0 and then continues to rise on interval (\(0, \infty\)). Its second derivative \(j''(x) = 6x\) tells us this derivative is increasing on interval (\(-\infty, 0\)), 0 at 0 and then decreases on interval (\(0, \infty\)). From graph of \(j\) we can see a concave downward shape at intervals where second derivative is negative and concave downward at interval where second derivative is positive. Point of inflection is at 0.

x <- seq(-1.5, 1.5, 0.01)
plot(c(-2, 2), c(-2, 2), type = "n", ann = FALSE)
abline(h = 0, lty = "dashed")
lines(x, x^3, lwd = 2, col = 4)
title("Concavity of a graph")
title(xlab = expression(paste("j(x) = ", x^3)), ylab = "j(x)", line = 2)
text(1 - 0.2, 1^3, labels = "Upwards", srt = 45, cex = 0.8)
text(-1 + 0.2, (-1)^3, labels = "Downwards", srt = 45, cex = 0.8)

# Sign chart of j'(x)
rect(par("usr")[1], 1.5, -1.2, par("usr")[4], col = "chocolate2", border = NA)
text(c(-1.9, -1.6, -1.35, -1), rep(1.8, 4), labels = c("+++", 0, "+++", "j'(x)"))

# Sign chart of j''(x)
rect(1.2, par("usr")[3], 1.6, -1.5, col = "lightblue", border = NA)
rect(1.6, par("usr")[3], par("usr")[2], -1.5, col = "chocolate2", border = NA)
text(c(1, 1.35, 1.6, 1.9), rep(-1.8, 4), labels = c("j''(x)", "---", 0, "+++"))
points(0, 0, pch = 21, bg = 4)
text(rep(0, 2), c(0.4, 0.2), c("Inflection", "point"), cex = 0.7)

A.13.2.3.1 Second-derivative evaluation

Suppose we do not have a graph or do not want to draw a graph of a function, but we want to locate its vertices or local extrema, then we can use second derivative to do this using what is called a second-derivate evaluation. Using this evaluation we can locate a function’s vertices called local maxima and minima.

If a derivative at \(h'(a) = 0\) and it’s second derivative \(h''(a) > 0\), then we know interval containing point \(x = a\) is positive and concave upwards. If left side of this derivative is negative while its right side is positive, then it is a local minimum.

However, if at derivative \(h'(a) = 0\) and its second derivative is \(h''(a) < 0\) then we know interval containing point \(x = a\) is negative and concave upwards. If left side of this derivative is positive while right side is negative then this would imply a local maximum.

A point where derivative \(h'(a) = 0\) and second derivative \(h''(a) = 0\) tells us nothing, it could be a local extrema or a point of inflection hence we need to use first-derivative evaluation to determine its shape.

Do take note, in our explanation above we are taking \(a\) as a critical value for \(h(x)\).

Let’s look at an example to see how we bring together all these concepts of limits, interval and signs, first derivative (increasing/decreasing properties) and second derivative (concavity properties) to analyse or predict shape of a graph.

Example

Suppose we are given following graph of derivative of function \(h\) and asked to discuss and plot a possible graph of \(h\), what do we need to take into consideration to achieve this task.

third <- expression(x^3 - 2*x)
third_vertices <- c(-sqrt(2), -sqrt(2/3), sqrt(2/3), sqrt(2))
x <- sort(c(third_vertices, seq(-2, 2, 0.01)))
plot(c(-5, 5), c(-5, 5), type = "n", ann = FALSE)
abline(v = 0, h = 0, lty = "dashed")
lines(x, eval(third), col = 4)
title("Derivative of function h", line = 1)
title(xlab = "x", ylab = "h'(x)", line = 2.5)
points(c(third_vertices, 0), y = c(eval(third)[which(x %in% third_vertices)], 0), pch = 21, bg = 4)
text(4, -4.2, labels = expression(paste("h(x) = ", x^3 - 2*x)), cex = 0.9)
text(third_vertices, rep(-4.5, 4), labels = c(expression(-sqrt(2)), expression(-sqrt(frac(2, 3))), expression(sqrt(frac(2, 3))), expression(sqrt(2))), cex = 0.7)

From this plot we can see six intervals (\(-\infty, -\sqrt{2}\)), (\(-\sqrt{2}, -\sqrt{2/3}\)), (\(-\sqrt{2/3}, 0\)), (\(0, \sqrt{2/3}\)), (\(\sqrt{2/3}, \sqrt{2}\)), and (\(\sqrt{2}, \infty\)).

From initial interval (\(-\infty, -\sqrt{2}\)), we note graph of \(h\) is negative and increasing suggesting a concave upward shape. Second interval (\(-\sqrt{2}, -\sqrt{2/3}\)) is positive and increasing suggesting graph of \(h\) at this interval is concave upward. Since first interval was negative and second is positive, then point \(x = -\sqrt{2}\) is a local minimum. Third interval (\(-\sqrt{2/3}, 0\)) is positive but decreasing suggesting graph of \(h\) at this interval is concave downward. Since there was no change of sign from second to third interval then point \(x = -\sqrt{2/3}\) is not a local extremum but a point of inflection. Fourth interval (\(0, \sqrt{2/3}\)) is negative and decreasing suggesting graph of \(h\) at this interval is concave downward. Since third interval was positive while this (fourth) interval is negative, then point \(x = 0\) is a local maximum. Fifth interval (\(\sqrt{2/3}, \sqrt{2}\)) is negative but increasing suggesting a concave upward shape. Given that there was no change in sign from fourth to fifth, then point \(x \sqrt {2/3}\) is a point of inflection but not a local extremum. Sixth interval (\(\sqrt{2}, \infty\)) is positive and increasing suggesting a concave upward shape at this interval. Since fifth interval was negative and this interval is positive then point \(x = \sqrt{2}\) is a local minimum. Table below summarizes all this.

\(x\) \(h'(x)\) Shape of \(h\)

\(-\infty < x < -\sqrt{2}\) | Negative and increasing | Decreasing and concave upward \(-\sqrt{2}\) | x-intercept | Local minimum \(-\sqrt{2} < x < -\sqrt{2/3}\)| Positive and increasing| Increasing and concave upward \(-\sqrt{2/3}\) | Local maximum | Point of inflection \(-\sqrt{2/3} < x < 0\) | Positive but decreasing | Increasing and concave downward 0 | x-intercept | Local maximum \(0 < x < \sqrt{2/3}\) | Negative and decreasing | Decreasing and concave downward \(\sqrt{2/3}\) | Local minimum | Inflection point \(\sqrt{2/3} < x < \sqrt{2}\) | Negative but increasing | Decreasing and concave upward \(\sqrt{2}\) | x-intercept | Local minimum \(\sqrt{2} < x < \infty\) | Positive and increasing | Increasing and concave upward

Based on information we have, and particularly our three local extrema (local minimum, local maximum and local minimum), most probable function of \(h\) is fourth-degree polynomial as shown below.

h <- expression(2*x^4 - 4*x^2 + x - 1)
plot(-c(-2.5, 2.5), c(-4.2, 1.8), type = "n", ann = FALSE, axes = FALSE, frame.plot = TRUE)
extrema <- c(-1.0574538, 0.1270510, 0.9304029)
abline(v = extrema[2], lty = "dashed")
x <- sort(c(seq(-1.7, 1.5, 0.001), extrema))
lines(x, eval(h), col = 4)
x <- sort(c(extrema, -0.5774538, 0.57705))
points(x, eval(h), pch = 21, bg = 4)
axis(1, at = x, labels = rep("", length(x)))
text(x = x, y = rep(-5.4, length(x)), labels = c(expression(-sqrt(2)), expression(-sqrt(frac(2,3))), 0, expression(sqrt(frac(2, 3))), expression(sqrt(2))), cex = c(rep(0.7, 2), 0.9, rep(0.7, 2)), xpd = TRUE)
text(extrema[2], -6.3, labels = "x", xpd = TRUE)
text(2.6, -1, labels = "Fourth-degree polynomial", srt = 90)
title("Sketch of a possible graph of 'h'")

A.13.2.3.2 Curve Sketching Techniques
A.13.2.3.2.1 Limits at infinity

In this section we want to see what happens to a graph of a function at points where \(x\) increases or decreases without bound. This is a point where limit does not exist. For these we shall be looking at exponential functions, polynomial functions and rational functions.

Limits at infinity for exponential functions

Here we want to explore graph and limits of exponential functions as \(x\) increases and decreases without bound.

Let’s begin by looking at exponential functions when \(x\) increases to an extreme value. We will do this with two exponential functions, \(j(x) = x^2\) and \(h(x) = 1/x^2\).

Given \(x\) values one hundred, one thousand, ten thousand, and one million, let’s \(j(x)\) and \(h(x)\) to see what happens for high values of \(x\).

x <- c(100, 1000, 10000, 100000, 1000000)
y1 <- expression(x^2)
y2 <- expression(1/x^2)
matrix(c(eval(y1), eval(y2)), nrow = 2, byrow = TRUE, dimnames = list(c(y1, y2), x = x))
##        x
##           100  1000 10000 1e+05 1e+06
##   x^2   1e+04 1e+06 1e+08 1e+10 1e+12
##   1/x^2 1e-04 1e-06 1e-08 1e-10 1e-12

From this table we can see \(x^2\) increases as \(x\) increases to an unknown high value we referred to as infinity (\(\infty\)). Therefore we can say as \(x\) increases to infinity so does \(x^p\). Where \(p\) stands for number \(x\) is raised to.

Symbolically we can denote this as:

\[x \to \infty \qquad{} \qquad{} x^p \to \infty\]

or

\[\lim_{x \to \infty} x^p = \infty\]

From this table we also see that as \(x\) increases, \(1/x^2\) decreases to almost 0 but not really reaching it. Reasoning here is that dividing a constant with a very high value outputs a very small number approaching 0. Therefore we can say, as \(x\) increases to infinity, \(1/x^p\) decreases to almost 0. Symbolically we can denote this as:

\[x \to \infty \qquad{} \qquad{} \frac{1}{x^p} \to 0\]

or

\[\lim_{x \to \infty} \frac{1}{x^p} = 0\]

This is shown in figure below.

plot(c(0, 2.1), c(0, 4.1), type = "n", ann = FALSE, xaxt = "n")
axis(1, at = 0:2)
x <- seq(0, 2, 0.00001)
lines(x, eval(y1), col = 4)
text(1.8, 1.8^2 + 0.4, labels = expression(paste("j(x) = ", x^2)), srt = 45)
x <- seq(0.5, 2, 0.00001)
lines(x, eval(y2), col = "chocolate2")
text(1.8, 1/1.8^2 + 0.3, labels = expression(paste("h(x) = ", 1/x^2)), srt = -15)
title('Exponential functions for high "x" values')

Note, for function \(h\), graph is approaching x-axis or where \(y = 0\). This line \(y = 0\) is what we refer to as horizontal asymptote as it is the value of y we are approaching as \(x\) increases unbound.

exponential functions for decreasing \(x\) values has a similar pattern except no real value is obtained if \(x\) is negative and \(0 < p < 1\). Value of \(p\) also determines if \(x^p\) approaches \(\infty\) or \(-\infty\), for example, if \(p\) is even like 2, then it approaches \(\infty\) if it is odd like 3, then it approaches \(-\infty\).

Other infinity limits for exponential functions are:

\[\text{1. } \lim_{x \to -\infty} \frac{c}{x^p} = 0 \qquad{} \qquad{} \text{2. } \lim_{x \to \infty} \frac{c}{x^p} = 0\]

\[\text{3. } \lim_{x \to -\infty} cx^p = \pm \infty \qquad{} \qquad{} \text{4. } lim_{x \to \infty} cx^p = \pm \infty\]

Where \(p\) is positive real number and \(c\) is any constant number.

Limits at infinity for polynomial functions

Given a polynomial, we can transform it to reciprocal for to make it convenient to note how they are when \(x\) approaches \(\infty\) or \(-\infty\).

In that regard, given:

\[j(x) = 3x^5 - 2x^4 + 4x^2 + 6\]

We can factor first term hence outputting a reciprocal function

\[j(x) = 3x^5(1 - \frac{2x^4}{3x^5} + \frac{4x^2}{3x^5} + \frac{6}{3x^5})\]

Limit for what in bracket thus becomes

\[\lim_{x \to \infty}(1 - \frac{2x^45}{3x^5} + \frac{4x^2}{3x^5} + \frac{6}{3x^5}) = 1 - 0 + 0 + 0 = 1\]

Here we used limit of exponential function of the form \(1/x^p\) and that of a constant discussed in our section on properties of limits.

We can now say for increasing \(x\) towards infinity, what is in bracket is approximately 1 and when multiplied by factored term we get this factored term which is really our first term. This happens for decreasing values of \(x\) heading towards negative infinity.

In conclusion, for any polynomial, with increasing or decreasing \(x\) heading towards infinity or negative infinity, its limit is the same as that of its first (highest) term.

Given a polynomial of the form:

\[j(x) = a_nx^n + a_{n-1}x^{n-1} + ... + a_1x + a_0 \qquad{} a_n \ne 0, \qquad{} n \leqslant 1\]

We can symbolically represent its limit as:

\[\lim_{x \to \infty} j(x) = \lim_{x \to \infty} a_nx^n = \pm \infty\]

and

\[\lim_{x \to -\infty} j(x) = \lim_{x \to -\infty} a_nx^n = \pm \infty\]

It is useful to note that polynomials with degree greater than one cannot have a horizontal asymptote. This is because limits at infinity for a polynomial equals it’s first term which increases vertically or decreases vertically. Exception to this are polynomials of zero degrees which are constant functions with a linear graph.

Limits at infinity and horizontal asymptotes for rational functions

Rational functions are basically ratios of two polynomials which means we can determine their limits at infinity by getting ratio of limits of both polynomials.

For example, given

\[h(x) = \frac{3x^4 - 2x^3 + 6x + 2}{9x^4 + 4x^2 + 2x - 1}\]

we can get its limit by factoring highest term from both polynomials and reducing ratio of factored terms.

\[h(x) = \frac{3x^4(1 - \frac{2x^3}{3x^4} + \frac{6x}{3x^4} + \frac{2}{3x^4})}{9x^4(1 + \frac{4x^2}{9x^4} + \frac{2x}{9x^4} - \frac{1}{9x^4})}\]

\[\lim_{x \to \infty} h(x) = \lim_{x \to \infty}(\frac{1}{3}.\frac{1 - 0 + 0 + 0)}{1 + 0 + 0 + 0}) = \frac{1}{3}\]

In general, for any rational function of the form:

\[h(x) = \frac{a_mx^m + a_{m-1}x^{m-1} + ... + a_1x + a_0}{b_nx^n + b_{n-1}x^{n-1} + ... + b_1x + b_0} \qquad{} a_m \ne 0, \quad{} b_n \ne 0\]

then

\[\lim_{x \to infty} h(x) = \lim_{x \to \infty}\frac{a_mx^m}{b_nx^n} \qquad{} \text{ and } \qquad{} \lim_{x \to -\infty} h(x) = lim_{x \to -infty} \frac{a_mx^m}{b_nx^n}\]

From this generalization, there three possible outcomes, two of which have a horizontal asymptote.

First outcome where \(y = 0\) or x-axis is a horizontal asymptote occurs if degree of numerator limit is less than denominator, that is \(m < n\). Note, \(\lim_{x \to \infty} h(x) = \lim_{x \to -\infty} h(x) = 0\)

Second outcome where degree of numerator is equal to denominator or \(m = n\) limit is given by \(a_m/b_n\), that is \(\lim_{x \to \infty} h(x) = \lim_{x \to -\infty} h(x) = a_m/b_n\). Line \(y = a_m/b^n\) is its horizontal asymptote.

Third outcome is when degree of numerator is greater tan denominator or \(m > n\), then each limit will either be \(\infty\) or \(\infty\) depending on values of \(m, n, a_m, \text{ and }, b_n\). In this case there is no horizontal asymptote as it heads towards plus or minus infinity.

Overall, a rational function can have at most one horizontal asymptote.

A.13.2.3.3 Locating Vertical Asymptoes

For a given point, a vertical asymptote occurs when denominator of a rational limit evaluates to zero while numerator evaluates to a value that is not zero. That is to say, if function

\[j(x) = \frac{a(x)}{b(x)}\]

where \(a\) and \(b\) are continuous over point \(x = h\) and at this point denominator \(b(x) = 0\) while numerator \(a(x) \ne 0\) then line \(x = h\) is a vertical asymptote of function \(j\).

If denominator and numerator evaluate to zero, then limit is indeterminate hence need to use algebraic simplification.

Example

Given

\[h(x) = \frac{x^2 + 2x + 3}{x^2 + x - 6}\]

Let \(a(x)\) be \(x^2 + 2x + 3\) and \(b(x)\) be \(x^2 + x - 6\).

We can factor denominator to locate points where it evaluates to zero

\[x^ + x - 6 = (x - 2)(x + 3)\]

We have \(x = 2\) and \(x = -3\). Equating both values to our \(a(x)\) we note that they are both not equal to zero, hence \(x = 2\) and \(x = -3\) are vertical asymptote of \(h\).

hx <- expression((x^2 + 2*x + 3)/(x^2 + x - 6))
x <- seq(-5, 5, 0.1)
plot(x, eval(hx), type = "l", col = 4, ann = FALSE)
abline(v = c(-3, 2), lty = "dashed")
title("Vertical asymptotes", sub = expression(paste("hx = ", (x^2 + 2*x + 3)/(x^2 + x - 6))), font.sub = 2)
title(xlab = "x", ylab = "h(x)", line = 2)

Remember, if at given point \(a(x)\) and \(b(x)\) evaluates to zero, then algebraic simplification would have to be been called for.

A.13.2.3.4 Graphing Strategy

In this section we summarize all we have been discussing as far as graphing and analyzing graphs is concerned. We do this in four steps which we refer to as a graphing strategy.

Initial step requires us to analyze given function in terms of its domain values. These are real numbers which produce values for function. This step also requires us to locate intercepts, that is \(y\) and \(x\) intercepts. X-intercept is the point \(x = 0\) and y-intercept is the point where function evaluates to 0. Final step is to locate function’s asymptotes.

Step two requires us to analyze derivative of a given function. We do this by identifying critical values and then constructing a sign chart for derivative function. Using this sign chart we can establish which intervals are positive and which are negative implying intervals of the function that are increasing and decreasing. Derivatives can also identify local minimum and maximum points.

Step three requires us to analyze second derivative of given function. From its sign chart we can establish concavity of graph of given function as well as inflection points.

Last step involves sketching graph of given function given known information

Example

As an example, let’s revisit our polynomial function \(h\)

\[h(x) = -x^5 - x^4 + 14x^3 + 6x^2 - 45x - 3\]

and see how much we can know about its graph even before plotting.

Step 1

Going by our sketching strategy, initial step is to analyse \(h\) in terms of:

a). its domain values,

b). intercepts, and

c). asymptotes

a). Domain of \(h\) is a set of all real numbers \(x\) which produce real values for \(h(x)\).

h <- function(x) (-x)^5 - x^4 + 14*x^3 + 6*x^2 - 45*x - 3
  1. Intercepts

y-intercept or \(h(0)\) is equal to -3.

x-intercepts are -3.49, -2.28, -0.07, 2.15 and 2.66.

c). Asymptotes

\(h\) is a polynomial function and therefore it has no horizontal nor vertical asymptote.

Step 2: Analyse derivative of \(h\)

Critical values of \(h\) are -3, -1.2, 1, 2.4, these are also partition numbers of its derivative

\[h'(x) = -5x^4 - 4x^3 - 42x^2 + 12x - 45\]

h_p <- expression((-5)*x^4 - 4*x^3 + 42*x^2 + 12*x - 45)
x <- c(-4, -2, 0, 2, 3)
signs <- ifelse(eval(h_p) < 0, "---", "+++")

Sign chart for this derivative is:

plot(c(-1.2, 20), c(-2, 6), type = "n", axes = FALSE, ann = FALSE)
rect(0, 0, 4, 4, border = NA, col = "lightblue")
rect(4, 0, 8, 4, border = NA, col = "chocolate2")
rect(8, 0, 12, 4, border = NA, col = "lightblue")
rect(12, 0, 16, 4, border = NA, col = "chocolate2")
rect(16, 0, 20, 4, border = NA, col = "lightblue")
mid <- c(2, 6, 10, 14, 18)
text(mid, rep(3.1, length(mid)), labels = c(expression(paste("(-", infinity, ", -3)")), "(-3, -1.2)", "(-1.2, 1)", "(1, 2.4)", expression(paste("(2.4,", infinity,")"))), cex = 0.9)
text(c(-0.8, c(mid, 4, 8, 12, 16)), rep(2, 6), labels = c("h'(x)", signs, rep(0, 4)))
text(c(-0.8, mid), rep(0.5, 6), labels = c("h(x)", "Decreasing", "Increasing", "Decreasing", "Increasing", "Decreasing"), cex = c(1, rep(0.7, 5)))
segments(0, 1.2, 20, 1.2)
points(c(4, 8, 12, 16), rep(1.2, 4), pch = 20)
text(20.3, 1.2, "x", cex = 0.9, font = 2)
text(c(4, 8, 12, 16), rep(0.5, 4), labels = round(extrema, 1), cex = 0.7, font = 2)

Based on this sign chart, graph of \(h\) begins with a decrease then a local minimum, an increase then a local maximum, a decrease then a local minimum, an increase then a local maximum and finally a decrease. This suggests a fifth-degree polynomial.

Step 3: Analyse second degree of \(h\)

Second degree of \(h\) is

\[h''(x) = -20x^3 -12x^2 + 8x + 12\]

Partition numbers are all x coordinates which make \(h''(x) = 0\) and its local extrema. Below is a graph of derivative of \(h\) which shows partition point (approximately) -3.0, -2.3, -1.2, -0.1, 1.0, 1.8 and 2.4.

x <-  seq(-4, 4, 0.0001)
plot(c(-6, 6), c(-200, 100), type = "n", ann = FALSE)
abline(v = 0, h = 0, lty = "dashed")
lines(x, eval(h_p), col = 4)
x_intercepts <- c(-3, -1.23303, 1, 2.43304)
points(x_intercepts, rep(0, 4), pch = 21, bg = 4)

h_pp <- expression((-5)*x^3 - 3*x^2 + 21*x + 3)
extrema_pp <- x <- c(-2.30748, -0.1407, 1.84818)
points(x, eval(h_p), pch = 21, bg = 4)

Here is a sign chart showing concavity of graph of \(h\).

op <- par(c("mar", "mai"))
par(mar = rep(0, 4), mar = rep(0, 4))

p_num <- sort(c(extrema, extrema_pp)) # Partition numbers
x <- c(-4, -2.5, -2, -1, 0, 1.5, 2, 3) # T-numbers
signs <- ifelse(eval(h_pp) < 0, "---", "+++")

plot(c(-1.2, 43), c(-2, 6), type = "n", axes = FALSE, ann = FALSE)
rect(0, 0, 5, 4, border = NA, col = "chocolate2")
rect(5, 0, 10, 4, border = NA, col = "chocolate2")
rect(10, 0, 15, 4, border = NA, col = "lightblue")
rect(15, 0, 20, 4, border = NA, col = "lightblue")
rect(20, 0, 25, 4, border = NA, col = "chocolate2")
rect(25, 0, 30, 4, border = NA, col = "chocolate2")
rect(30, 0, 35, 4, border = NA, col = "lightblue")
rect(35, 0, 40, 4, border = NA, col = "lightblue")

mid <- seq(2.5, 37.5, 5)
text(mid, rep(3.1, length(mid)), labels = c(expression(paste("(-", infinity, ", -3)")), "(-3, -2.3)", "(-2.3, -1.2)", "(-1.2, -0.1)", "(-0.1, 1)", "(1, 1.8)", "(1.8 - 2.4)", expression(paste("(2.4,", infinity,")"))), cex = 0.8)
text(c(-1.2, c(mid, seq(5, 35, 5))), rep(2, length(mid)+7), labels = c("h''(x)", signs, rep(0, 7)))
text(mid, rep(0.5, 8), labels = ifelse(signs == "---", "Decreasing", "Increasing"), cex = rep(0.7, 8), srt = 45)
segments(0, 1.2, 40, 1.2)
points(seq(5, 35, 5), rep(1.2, 7), pch = 20)
text(40.3, 1.2, "x", cex = 0.9, font = 2)
text(c(-1.1, seq(5, 35, 5)), rep(0.5, 4), labels = c("h'(x)", round(p_num, 1)), cex = c(1, rep(0.7, 7)), font = c(1, rep(2, 7)))


par(mar = op$mar, mai = op$mai)

From sign chart of derivative of \(h\) we know interval (\(-\infty, 3\)) is negative and it’s second derivative at this interval is increasing meaning shape of \(h\) is decreasing and concave upward at this interval.

Initial interval (\(-\infty\), -3) is negative on derivative of \(h\) and decreasing on second derivative. This suggests graph of \(h\) at this interval is decreasing with a concave upward shape.

Interval (\(-3, -1.2\)) is positive on \(h\)’s derivative, on its second derivative it is increasing on interval (\(-3, -2.3\)) but decreasing on interval (\(-2.3, -1.2\)). This implies that graph of \(h\) is increasing on entire interval (\(-3, -1.2\)) but is concave upward on interval (-3, -2.3) while concave downward on interval (-2.3, -1.2).

On interval (-1.2, 1), derivative of \(h\) is negative, on its second derivative, this interval is declining on interval (-1.2, -0.1) but rising on interval (-0.1, 1). This implies graph of \(h\) is decreasing on on entire interval (-1.2, 1) but concave downward on interval (-1.2, -0.1) while concave upward on interval (-0.1, 1)

Interval (1, 2.4) is positive on derivative of \(h\), on its second derivative it is increasing on interval (1, 1.8) while decreasing on interval (1.8, 2.4). This suggests graph of \(h\) at this interval (1, 2.4) is increasing but concave upward on interval (1, 1.8) while concave downward on interval (1.8, 2.4).

Final interval (2.4, \(\infty\)) is negative on derivative of \(h\) and decreasing on its second derivative. This implies this interval is decreasing with a concave downward shape on graph of \(h\).

Sketch graph of \(h\)

Based on information we now have, we can plot graph of \(h\) as shown below.

plot(rep(c(-5, 5), 2), c(rep(40, 2), rep(-40, 2)), type = "n", ann = FALSE)
abline(h = 0, v = 0, lty = "dashed")
x <- xx
lines(x, h(x), col = 4)
x <- extrema
points(extrema, h(x), pch = 21, bg = 4)

A.13.2.4 Optimization: Absolute Maxima and minima

In statistics, one crucial activity is locating minimum and maximum quantities of given entities. For example given a cost model for construction of a school WASH (water sanitation and hygiene) project, we might be interested in minimum amount of funds we can use while maintaining quality.

These points are referred to as absolute maxima and minima and locating them is problem of optimization.

For function \(j\) with \(x\) as real numbers in its domain, absolute maximum is a point \(j(a)\) such that \(j(a) \geqslant j(x)\). Similarly, an absolute minimum is a point \(j(a)\) such that \(j(a) \eqslant j(x)\).

Below is a graph with an absolute maxima.

# Absolute maxima
j <- expression(-3*n^2)
n <- seq(-2, 2, 0.0001)
plot(c(-5, 5), c(-10, 0.2), type = "n", ann = FALSE)
lines(n, eval(j), col = 4)
points(0, 0, pch = 21, bg = 4)
title("Plot with absolute maxima")
text(5, -5, expression(paste("j(x) = ", -3*x^2)), srt = 90)

Here is a plot with absolute minima.

# Absolute minima
sixth <- expression(x^6 - 7*x^4 + 14*x^2 - x - 5)
x <- sort(c(seq(-2.3, 2.3, 0.01), sixth_vertices))
plot(c(-5, 5), c(-5, 5), type = "n", ann = FALSE)
y <- eval(sixth)
lines(x, y, col = 4)
points(x[which.min(y)], y[which.min(y)], pch = 21, bg = 4)
title("Plot with absolute minima", line = 1)
title(xlab = "x", ylab = "h(x)", line = 2)
text(4.9, 0, labels = expression(paste("F(x) = ", x^6 - 7*x^4 + 14*x^2 - x - 5)), cex = 0.8, srt = 90)

Graph below has no absolute minima nor maxima but it has local extrema.

# No absolute maximum or minimum but has local extrema
extr1 <- (2 - sqrt((-2)^2 - 4*3*(-14)))/6
extr2 <- (2 + sqrt((-2)^2 - 4*3*(-14)))/6
x <- sort(c(seq(-4, 4, 0.1), extr1, extr2))
ind <- c(which(x == extr1), which(x == extr2))
jx <- expression(x^3 - x^2 - 14*x + 11)
plot(x, eval(jx), type = "n", ylab = "j(x)")
abline(v = 0, h = 0, lty = "dashed")
lines(x, eval(jx), col = 4)
points(c(extr1, extr2), eval(jx)[ind], pch = 21, bg = c("chocolate2", "lightblue"))
title("No absolute minima or maxima")
text(c(extr1, extr2, 3.1), c(22, -10, 26), labels = c("Local Maximum", "Local Minimum", jx), cex = c(0.7, 0.7, 0.8))

All three graphs above have an open interval for their domain, that is (\(-\infty < x < \infty\)). This means a graph can have absolute maxima or absolute minima. However, if function is continuous on a closed interval, then it must have an absolute maxima and minima. This is often the case with most statistical models, as they have intervals for which \(x\) should be contained for example human height and body temperature.

Both absolute maxima and absolute minima occur at critical value (local extrema) or at end points. Both of these values are unique but both can occur at more that one point.

Graph \(j\) below is continuous on closed interval [-2, 2.1]. It has one absolute minima at point \(j(1.6) = -2.6\) and two absolute maxima at \(j(-1.6) = 4.6 = j(2.1)\).

fifth_vertices <- c(-1.64443286, 1.64443286, 2.119174)
x <- sort(c(fifth_vertices, seq(-2, 2, 0.0001)))
fifth <- expression(x^5 - 5*x^3 + 4*x + 1)
plot(c(5, -5), c(5, -5), type = "n", xaxt = "n", ann = FALSE)
axis(side = 1, at = -5:5, labels = c(rep("", 3), -2, rep("", 3), 2.1, rep("", 3)))
title("Absolute minima and maxima", line = 1)
title(xlab = "x", ylab = "f(x)", line = 2)
lines(x, eval(fifth), col = 4)
x <- c(-2, fifth_vertices)
points(x, eval(fifth), pch = 21, bg = 4)
text(c(-2, -0.9, 1.6, 2.8), c(0.1, 4.63, -3, 4.6), labels = c("j(-2) = 1", "j(-1.6) = 4.6", "j(1.6) = -2.6", "j(2.1) = 4.6"), cex = 0.7)
text(c(-2, 2), rep(-5.5, 2), labels = c("[", "]"), xpd = TRUE, col = 4)
segments(-2, -5.45, 2, -5.45, col = 4, xpd = TRUE)
text(4.9, 0, labels = expression(paste("j(x) = ", x^5 - 5*x^3 + 4*x + 1)), srt = 90)

From this example it is clear to see that for a closed interval, locating absolute minima and maxima needs us to identify function’s critical values and end end points. From there we just need to identify minimum and maximum values. But take note this function must be continuous like a polynomial.

Just to wrap this up, if we look at our function

\[x^5 - 5*x^3 + 4*x + 1\]

On a closed interval [-2, 2.1] and critical values -1.6, -0.5, 1.6, and 0.5, its absolute minimum is about -2.6 and largest value is about 4.6 which happens to fall on two places, at -1.6 and 2.1.

A.13.2.5 Constant e

Earlier in this chapter we mentioned number \(e\) as an irrational value which can be approximated by expression

\[[1 + (\frac{1}{n})]^n\]

taken with high values of \(n\)

Here we want to look at it from a limits perspective. That is, number \(e\) can be formally defined as:

\[e = \lim_{n \to \infty}(1 + frac{1}{n})^n\]

Another definition of irrational number \(e\) is given by

\[e = \lim_{s \to 0}(1 + s)^(1/s)\]

At first glance one would expect limit of \(s\) approaching 0 to be 1 given that 1 plus any number close to 0 would output 1, however, this is not the case as taking small values of \(s\) approaching 0 output a value close to \(e\) or 2.7182818.

e <- expression((1 + s)^(1/s))
s <- c(-0.5, -0.1, -0.01, -0.001, -0.0001, 0, 0.0001, 0.001, 0.01, 0.1, 0.5)
lims <- eval(e)
lims[6] <- exp(1) 
matrix(c(s, lims), nrow = 2, byrow = TRUE, dimnames = list(c("s", "(1 + s)^(1/s)"), rep("", length(s))))
##                                                                    
## s             -0.5 -0.100000 -0.010000 -0.001000 -0.000100 0.000000
## (1 + s)^(1/s)  4.0  2.867972  2.731999  2.719642  2.718418 2.718282
##                                                       
## s             0.000100 0.001000 0.010000 0.100000 0.50
## (1 + s)^(1/s) 2.718146 2.716924 2.704814 2.593742 2.25

This implies function \((1 + s)^1/s\) is discontinuous as s = 0.

A.13.2.6 Derivative of Logarithmic and Exponential Functions

Logarithms and exponents are widely applied in statistics, it is therefore important for us to discuss these functions as regards their derivatives and graphing techniques.

Derivative of any exponent with base \(e\) is its own derivative, that is,

\[\frac{d}{dx} e^x = e^x\]

Derivative of natural logarithm is given by

\[\frac{d}{dx} ln x = \frac{1}{x}\]

Let’s look at how derivative of natural logarithm is arrived at before looking at derivative of exponential function.

For function \(j\) given as \(j(x) = ln x\) we can use definition of derivative

\[j'(x) = \lim_{h \to 0} \frac{j(x + h) - j(x)}{h}\]

to arrive at derivative of \(j\).

We begin by simplifying our difference quotient

\[\frac{j(x + h) - j(x)}{h} = \frac{ln(x + h) - ln x)}{h}\]

Using sixth property of logarithms, we can change this to a fraction

\[ = \frac{1}{h} ln \frac{x + h}{x}\]

Multiplying by \(x/x\) which is simply 1 transforms this equation such that we can be able to use seventh property of logarithms.

\[ = \frac{x}{x} . \frac{1}{h} ln \frac{x + h}{x}\]

Note, \((x + h)/x\) is the same as \((x/x) + (h/x) = 1 + h/x\), therefore

\[\frac{1}{x} [\frac{x}{h} ln(1 + \frac{h}{x})]\]

Using seventh property we get

\[ = \frac{1}{x} ln(1 + \frac{h}{x})^x/h\]

Without going into locating limits and only basing this on our knowledge on deriving limits, we know

\[\frac{d}{dx} ln x = \lim_{h \to 0} \frac{j(x + h) - j(x)}{h}\]

will evaluate to

\[\frac{d}{dx} ln x = \frac{1}{x}\]

In the same way, we can show derivative of \(e^x\) is equal to itself (\(e^x\)).

For exponential function \(j\) given by \(j(x) = e^x\), we can simplify its difference quotient as

\[\frac{j(x + h) - j(x)}{h} = \frac{e^(x + h) - e^x}{h}\]

Using first property of exponents we get

\[ = \frac{e^xe^h - e^x}{h}\]

Factoring out \(e^x\) we get

\[ = e^x(\frac{e^h - e^x}{h})\]

And from our computation of limits we know this will lead to

\[\frac{d}{dx} e^x = e^x\]

This simplicity of \(e^x\) makes it widely applicable in many situations.

If we are given the following function

\[j(x) = 3e^x - ln\text{ }x^2\]

we can get its derivative by starting with derivative of each term. For first term \(3e^x\) we use derivative of a constant function rule, that is:

\[3\frac{d}{dx}e^x\]

This should remain the same \(3e^x\). For our second term \(ln\text{ }x^2\), seventh property of logarithms tells us logarithm of a exponential function is that exponent times log of that exponent, therefore becoming:

\[2 \frac{d}{dx} ln \text{ } x = \frac{2}{x}\]

We can now say derivative of \(j\) is:

\[j'(x) = 3e^x - \frac{2}{x}\]

A.13.2.7 Graphing Techniques

Graphing exponential and logarithmic functions is fairly simple than polynomial functions of higher degrees. However, we can use our knowledge on derivatives to get to know more about graphs of these two functions.

From graph of \(j\) below, we can see graphs of exponential functions are increasing an concave upwards with x-axis as its horizontal asymptote because \(\lim_{x \to -\infty} e^x = 0\).

Graph \(h\) show us that graphs of logarithms are also increasing but at a decreasing rate thus they are concave downward. Since we cannot take logarithms of negative numbers, then domain are all positive real numbers. Logarithm of zero is a very high number approaching infinity, therefore graph of a logarithmic function has y-axis as its vertical asymptote, that is \(\lim_{x \to 0^+} ln \text{ } x = -\infty\).

Looking at both graphs we note that they are a reflection of each other along line \(x = y\). It should also be evident that as \(x\) increases both graphs head to infinity.

An interesting point to note is that, even though both graphs are increasing as x increases, graph of \(j\) increases more rapidly than \(x\) while that of \(h\) increases much slower as \(x\) increases. We therefore conclude by saying graph of exponential functions increases more rapidly than any positive exponent of x and logarithmic functions increase more slowly than any positive exponent of x.

x <- seq(-5, 5, 0.001)
#log(x)
plot(c(-5, 5), c(-5, 5), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = -5:5)
abline(v = 0, h = 0, lty = "dashed")
lines(-5:5, -5:5, col = "chocolate2", lty = "dashed")
lines(x[x > 0], log(x[x > 0]), col = 4)
lines(x, exp(x), col = 4)
text(1.1, exp(1.4), labels = expression(paste("j(x) = ", e^x)), srt = 45)
text(3.5, log(3.5)+.6, "h(x) = ln x", srt = 15)

A.13.3 Integration

In our introduction we mentioned calculus has two main topics, differential calculus and integral calculus. We now have basics of differential calculus and therefore in this section we aim to introduce or re-introduce integral calculus.

There are two types of integration, indefinite integral and definite integral. We will not go into too much detail as this chapter is aimed at re-introducing the basics of mathematics necessary for post-descriptive statistics. In this regard we shall discuss the following:

A.13.3.1 Antiderivatives and Indefinite Integral

Antiderivative involves reformulating a function from its derivative. Two functions can have similar derivatives with only difference being a constant.

For example, derivative \(6x^2\) can have these antiderivatives:

  1. \(2x^3\)
  2. \(2x^3 - 2\)
  3. \(2x^3 + 4\)
  4. \(2x^3 + \sqrt{2}\)

Note our first antiderivative can be expressed as \(2x^3 + 0\) which means all four antiderivatives can be expressed as \(2x^3 + K\) where \(K\) stands for any real number.

Given this fact, graphing an antiderivative of any derivative will yield in the shape of graph with only difference being a vertical shift.

n <- seq(-2, 2, 0.0001)
j1 <- expression(2*n^3)
j2 <- expression(2*n^3 - 2)
j3 <- expression(2*n^3 + 4)
j4 <- expression(2*n^3 + sqrt(2))
plot(c(-2.5, 2.5), c(-20, 20), type = "n", ann = FALSE)
lines(n, eval(j1), col = 4, lty = "dashed", lwd = 2)
lines(n, eval(j2), col = 5, lty = "dashed", lwd = 2)
lines(n, eval(j3), col = 6, lty = "dashed", lwd = 2)
lines(n, eval(j4), col = 7, lty = "dashed", lwd = 2)
title(expression(paste("Possible graphs of antiderivatives of ", 6*x^2)))
legend("bottomright", legend = c(j1, j2, j3, j4), lty = "dashed", col = c(6:7, 4:5), lwd = 2)

From preceding discussion we note antidifferentiation of a function can lead to more than one group of functions differing with just their constant. This fact can be expressed symbolically by what is referred to as indefinite integral.

\[\int j(x) \space{} dx\] Symbol \(\int\) is called an integral sign.

For function \(j(x)\), we can write all groups of its antiderivatives as:

\[\int j(x) = J(x) + K \quad{} \text{ if } \quad{} J'(x) = j(x)\]

Function \(j(x)\) here is called an integrand and symbol \(dx\) indicates that antiderivative is performed with respect to \(x\). Constant \(K\) is called constant of integration.

For our earlier example, we can express symbolically with indefinite integral as:

\[\int 6x^2 \space{} dx = 2x^3 + K\]

This is because:

\[\frac{d}{dx}(2x^3 + K) = 6x^2\]

To simplify process of getting indefinite integrals of frequently used functions four properties have been developed. These are:

For a constant integral, indefinite integral is given by \(\int k \space{} dx = kx + C\) where \(k\) and \(C\) are constants. Example, \(\int 6 \space{} dx = 6x + C\) since \(\frac{d}{dy} (6x + C) = 6\)

When integrand is a exponential function, then its indefinite integral is given by \(\int x^n \space{} dx = \frac{x^{n+1}}{n+1}+K, \quad{} n \ne -1\). Example \(\int x^3 \space{} dx = \frac{x^4}{4} + K\).

When integrand is composed of a constant and a function, then its indefinite integral is given by \(\int kj(x) \space{} dx = k \int j(x) dx\). Example, \(\int 6x^2 \space{} dx = 6 \int x^2 = 6 . \frac{x3}{3} = 2x^3\). Do note, here we are moving our constant across our integral sign, this should not be done with variables like \(\int yx^2 \space{} dx \ne y \int x^2 \space{} dx\).

When integrand is composed of sum or difference of functions, then indefinite integral is given by \(\int [j(x) \pm h(x)] \space{}dx = \int j(x) \space{} dx \pm \int h(x) dx\). Example \(\int (x^2 + x^3) \space{} dx = \int x^2 + \int x^3 = \frac{x^3}{3} + \frac{x^4}{4}\)

Antiderivatives and indefinite integrals for exponential and logarithmic functions

This functions add two more properties to our list of properties for indefinite integral properties.

Fifth property is for indefinite integral for and exponential integrand. It is given by \(\int e^x \space{} dx = e^x + K\).

Sixth is for a logarithmic intergrand which is \(\int \frac{1}{x} \space{} dx = ln|x| + c, \quad{} x \ne 0\).

Example, let us get indefinite integral of function below:

\[\int (3e^x + \frac{4}{x}) \space{} dx\]

It should output this function

\[3e^x + 4 \space{} ln|x| + K\]

One informative application of indefinite integrals is get functions given slope (derivative) and a point on its graph.

For instance, if we are told a function \(j\) has a slope of \(6x^2\) and it passes through point (2, 10), then we can establish its function as \(j(x) = 2x^3 - 6\). This is because its indefinite integral is \(2x^3 + K\). We determined \(k\) as -6 since \(2 * 2^2 = 16\) and not 10.

A.13.3.2 Introduction to Definite Integral

To fully comprehend concept of probability which is a core concept in statistics we need to know how to determine area under a curve. Definite integral provides us with tools to handle these kind of issues. In particular, we will use definite integral to determine an area bounded by graph of a function, its x-axis and left and right vertical lines denoting start and end of area of interest as shown below

x <- seq(-2, 2, 0.0001)
h <- expression((-5)*x^2 + 13)
plot(c(-3, 3), c(0.5, 14), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(-1, 1), labels = c("a", "b"))
left <- seq(-1, 1, abs(-1 - 1)/50)
rect(xleft = left[-length(left)], ybottom = rep(par("usr")[3], length(left)), xright = left[-1], ytop = (-5)*left[-length(left)]^2 + 13, border = NA, col = "chocolate2")
segments(c(-1, 1), c(par("usr")[3], par("usr")[3]), c(-1, 1), c(-5*(-1)^2+13, -5*(1)^2+13), col = 4, lwd = 2)
segments(-1, 0, 1, 0, col = 4, lwd = 2)
lines(x, eval(h), col = 4, lwd = 2)
legend("topright", legend = expression(paste("h(x) = ", -5*x^2 + 13)), bty = "n")

Shaded region can be symbolically represented by:

\[\int_{a}^{b} h(x) \space{} dx = (\text{Area under curve from x = a to x = b})\]

We shall begin by introducing this concept in terms of a static area followed by a dynamic distance (with respect to time) and finally as a complete total change in area. In subsequent section we will see how definite integral is related to indefinite integral (antiderivatve) through a fundamental theorem of calculus.

A.13.3.2.1 Area

To get area under a curve or graph we need to divide that area into equal rectangles and total their areas. Since a graph’s curve could understate or overstate area when rectangles are used, then our objective is to find an optimal number of rectangles which cover area under curve without overestimating or underestimating it to much. We do this by defining an approximation error which is a difference between approximated number of rectangles and actual number of rectangles needed to cover area under consideration without any under or over estimation.

To grasp this concept, we will begin this discussion by looking at a simple function which is positive (on right side of Cartesian coordinate plane) and is monotonic. Monotonic means it is either increasing or decreasing over an interval. Later we will look at an example of a graph which crosses x-axis, that is, it has positive and negative values and another that is positive but not monotonic (has increasing and decreasing intervals).

General concept of area under a curve

Suppose we have \(j(x) = 2x^2 + 2\) and we want to find area bounded by graph of this function, x-axis and vertical lines at x = 1 and 2.5 as shown below, basically we want to determine area of “chocolate” colored region.

x <- seq(0.1, 5, 0.0001)
j <- expression(2*x^2 + 2)
plot(c(0.1, 4), c(0, 14.6), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 2.5), labels = c(1, 2.5))
lines(x, eval(j), col = 4, lwd = 2)
legend("topright", legend = expression(paste("j(x) = ", 2*x^2 + 2)))
xleft <- seq(1, 2.5, (2.5-1)/60)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-length(xleft)]^2 + 2, border = NA, col = "chocolate2")
segments(1, par("usr")[3], 2.5, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 2.5), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 2.5), y1 = c(2*1^2 + 2, 2*2.5^2 + 2), col = 4, lwd = 2)

As noted at start of this section, we can only do this using areas of rectangles to approximate this area given that there is no geometric function which can accomplish this. It is important to appreciate that this is an approximation an therefore not an exact area of given region.

We therefore begin with approximation of given area with 5 equally spaced rectangles. To get equal rectangles we need to compute width of each rectangle such that all five rectangle are between x = 1 and x = 5.

\[\frac{2.5 - 1}{5} = 0.3\]

Our computation gives us a width of 0.3, denoted by \(\Delta x\). \(\Delta x\) means change in x and is read as delta x.

Height of these rectangles will be given by evaluating left hand side x value of each rectangle as shown below.

plot(c(0.1, 4), c(0, 14.6), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 2.5), labels = c(1, 2.5))
lines(x, eval(j), col = 4, lwd = 2)
legend("topright", legend = expression(paste("j(x) = ", 2*x^2 + 2)))
xleft <- seq(1, 2.5, (2.5-1)/5)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-length(xleft)]^2 + 2, border = "chocolate", col = "chocolate2")
segments(1, par("usr")[3], 2.5, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 2.5), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 2.5), y1 = c(2*1^2 + 2, 2*2.5^2 + 2), col = 4, lwd = 2)

Area of a rectangle is given by height times width, therefore to get area of colored region we will need to compute area of all rectangles and add them up. We will call this computation left sum because we used x values on the left hand side and denote it as \(L_n\) where \(n\) are the number of rectangles.

In our case for each rectangle, height will be an evaluation left hand side x value while width would be 0.3.

\[L_{5} = j(1)*0.3 + j(1.3)*0.3 + j(1.6)*0.3 + j(1.9)*03 + j(2.2)*0.3\]

\[= 1.2 + 1.614 + 2.136 + 2.766 + 3.504 = 11.22\]

We have arrived at an area of 11.22, but looking at our graph we can clearly see that we have fully covered colored region for which we sort to get. We have actually underestimated intended region. In general, if graph is increasing, then left sum would always underestimate given region.

We can symbolically express this as:

\[11.22 = L_{5} < \int_{1}^{2.5} (2x^2 + 2) \space{} dx = \text{Area}\]

Since left side for an increasing function gave us an underestimate of area of interest, then we can try using right side. We will do this and superimpose on our graph to see difference.

plot(c(0.1, 4), c(0, 14.6), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 2.5), labels = c(1, 2.5))
lines(x, eval(j), col = 4, lwd = 2)
legend("topright", legend = expression(paste("j(x) = ", 2*x^2 + 2)))
xleft <- seq(1, 2.5, (2.5-1)/5)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-length(xleft)]^2 + 2, border = "chocolate", col = "chocolate2")
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-1]^2 + 2, border = "chocolate")
segments(1, par("usr")[3], 2.5, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 2.5), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 2.5), y1 = c(2*1^2 + 2, 2*2.5^2 + 2), col = 4, lwd = 2)

As we can see, now we have overestimated our region of interest. Its area is given by summing all areas of our rectangles with height computed using right x values. This summation is called right sum and denoted as \(R_n\) where \(n\) is number of rectangles. Right sum for five rectangles is:

\[R_5 = j(1.3)*0.3 + j(1.6)*0.3 + j(1.9)*0.3 + j(2.2)*0.3 + j(2.5)*0.3\]

\[= 1.614 + 2.136 + 2.766 + 3.504 + 4.350 = 14.37\]

In general, if graph of function we are considering is increasing, then right sum would always be an overestimate of region of interest.

Now, looking at left sum and right sum we see actual area is between these \(L_5 = 11.22\) and \(R_5 = 14.37\). We can represent this as:

\[11.22 = L_5 < \int_{1}^{2.5} (2x^2 + 2) \space{} dx < R_{5} = 14.37\]

When an unknown value is between two known values, then an average of known values can be a good approximation of unknown value. Therefore we can compute an approximate value by taking an average of \(L_5\) and \(R_5\).

\[\text{Average} = \frac{L_5 + R_5}{2} = \frac{11.22 + 14.37}{2} \approx 12.8\]

Using only five rectangles left us with an inefficient approximation of area under our curve. Suppose we increased this number (\(n\)) to 10 and to 100, visually we can see we are getting better approximation as \(n\) increases.

\[\frac{2.5 -1}{10} = 0.15\]

plot(c(0.1, 4), c(0, 14.6), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 2.5), labels = c(1, 2.5))
lines(x, eval(j), col = 4, lwd = 2)
legend("topright", legend = c(expression(paste("j(x) = ", 2*x^2 + 2)), "n = 10"), bty = "n")
xleft <- seq(1, 2.5, (2.5-1)/10)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-length(xleft)]^2 + 2, border = "chocolate", col = "chocolate2")
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-1]^2 + 2, border = "chocolate")
segments(1, par("usr")[3], 2.5, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 2.5), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 2.5), y1 = c(2*1^2 + 2, 2*2.5^2 + 2), col = 4, lwd = 2)

\[\frac{2.5 - 1}{100} = 0.015\]

plot(c(0.1, 4), c(0, 14.6), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 2.5), labels = c(1, 2.5))
lines(x, eval(j), col = 4, lwd = 2)
legend("topright", legend = c(expression(paste("j(x) = ", 2*x^2 + 2)), "n = 100"), bty = "n")
xleft <- seq(1, 2.5, (2.5-1)/100)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-length(xleft)]^2 + 2, border = "chocolate", col = "chocolate2")
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 2*xleft[-1]^2 + 2, border = "chocolate")
segments(1, par("usr")[3], 2.5, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 2.5), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 2.5), y1 = c(2*1^2 + 2, 2*2.5^2 + 2), col = 4, lwd = 2)

Computation-wise, we can get area of colored region with 16 rectangles as:

\[11.97375 = L_{10} < \int_{1}^{2.5} (2x^2 + 2) \space{} dx < R_{10} = 13.54875\]

Notice distance between \(L_{10}\) and \(R_{10}\) (approximately 1.575) is smaller than that of \(L_5\) and \(R_5\) (approximately 3.15).

Average of \(L_{10} \text{ and } R_{10}\) is approximately 12.8

For 100 rectangle, area is given as:

\[12.67136 = L_{100} < \int_{1}^{2.5}(2x^2 + 2) \space{} dx < R_{100} =12.82886\]

We can now see difference between \(L_{100}\) and \(R_{100}\) is much smaller (about 1.2) and has an average of 12.75011 (about 12.8).

In conclusion, approximating area under a curve with \(L_n\) or \(R_n\) is bound to produce an error, these are what we see as white boxes in our graph. Therefore to make better approximation we need to set an allowable error of approximation which will be a difference between approximated value and actual value.

In our subsequent section we reason out some formulas to compute error bounds for any approximation thereby enabling us to determine an ideal value of \(n\).

Error in Approximation

Let us begin by defining error from a positive monotone function which is decreasing over an interval [1, 2.5] before having a look at a positive monotone which is increasing over the same interval.

For a monotone function, area under a graph or colored region is between \(L_n\) and \(R_n\). We are mentioning monotone in particular as functions that are not monotone (they are increasing or decreasing like most polynomials), will have a different method for determining approximation error.

Graphically, this area between left sum and right sum is shown as uncolored rectangles.

j <- expression(-0.8*x^2 + 20)
plot(c(0.1, 4), c(0.1, 21), type = "n", xaxt = "n", yaxt = "n", ann = FALSE)
axis(1, at = c(1, 2.5), labels = c(1, 2.5))
axis(2, at = c((-0.8)*2.5^2 + 20, (-0.8)*1^2 + 20), labels = c("b", "a"))
lines(x, eval(j), col = 4, lwd = 2)
xleft <- seq(1, 2.5, (2.5-1)/5)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = -0.8*xleft[-length(xleft)]^2 + 20, border = "chocolate")
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = -0.8*xleft[-1]^2 + 20, border = "chocolate", col = "chocolate2")
segments(1, par("usr")[3], 2.5, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 2.5), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 2.5), y1 = c(-0.8*1^2 + 20, -0.8*2.5^2 + 20), col = 4, lwd = 2)
segments(x0 = c(par("usr")[1], par("usr")[1]), y0 = c((-0.8)*2.5^2 + 20, (-0.8)*1^2 + 20), x1 = c(2.5, 1), y1 = c((-0.8)*2.5^2 + 20, (-0.8)*1^2 + 20), lty = 2)

Since \(\int_{1}^{2.5} j(x) \space{}dx\) is in between \(L_n\) and \(R_n\), then we know error is less than uncolored region.

Numerically we can define errors as a value less than absolute difference between left sum \(L_n\) and right sum \(R_n\).

Therefore, for this area \(\int_{1}^{2.5} j(x) \space{}dx\) with

\(L_n =\) 26.712 and \(R_n =\) 25.452

Total uncolored area is 1.26. As this total area of all uncolored rectangles, then we can also compute it as height times width. Height is distance from \(j(2.5)\) to \(j(1)\) while width is equivalent to change in \(x\) for each rectangle, that is:

\[|j(2.5) - j(1)| * \frac{2.5 - 1}{5} = 1.26\]

In general, we can denote our second \(x\) value as \(b\) and our first as \(a\) and therefore express this area as:

\[|j(b) - j(a)|\Delta{x}\] where

\[\Delta{x} = \frac{b-a}{n}\]

\(n = \text{ number of rectangles}\)

From this, we note that error is less or equal (if n is high enough) to total area between \(L_n\) and \(R_n\) or total area of uncolored region.

\[\text{Error} \leqslant |R_n - L_n| = |j(b) - j(a)|\Delta{x}\]

Given what we now know of error, then we can also note that absolute difference between area under curve and left or right sum would always be less or equal to \(|R_n - L_n|\) or \(|j(b) - j(a)|\Delta{x}\).

If we took an average of \(L_n\) and \(R_n\), then we know this value would be less or equal to half uncolored region or half absolute difference between \(L_n\) and \(R_n\).

We can summarize all this with some formulas for monotonic functions.

For a closed interval [a, b], we can denote area under a curve \(\int_{a}^{b} j(x) \space{}dx\) as \(I\) and average of \(L_n\) and \(R_n\) as \(A_n\) and therefore state:

\[|I - L_n| \leqslant |j(b) - j(a)|\frac{b-a}{n}\]

\[|I - R_n| \leqslant |j(b) - j(a)|\frac{b-a}{n}\]

\[|I - A_n| \leqslant |j(b) - j(a)| \frac{b-a}{2n}\]

We now have a complete basics of what we mean by error bounds for monotonic left and right sums, as well as their averages. Now let us look at positive monotonic increasing function and see how we use what we have discussed.

Example

We are given

\[h(x) = 0.5x^2 + 2 \qquad{} 0 \leqslant x \leqslant 6\]

and we are asked to compute error bounds of \(L_5\), \(R_5\) and \(A_5\). We are also asked to determine \(n\) for approximation of \(\int_{1}^{4} (0.25x^2 + 2) \space{} dx\) to be within 0.05 of true value.

Let us begin by graphing this function including right and left rectangles. Remember, for an increasing monotone, right rectangles underestimate area under graph while left overestimate this area, hence we need to color our left and not right rectangles for us to see difference between \(L_n\) and \(R_n\).

x <- seq(0, 6, 0.0001)
h <- expression(0.5*x^2 + 2) 
plot(c(1, 5.5), c(0.2, 15), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 4), labels = c(1, 4))
lines(x, eval(h), col = 4, lwd = 2)
deltax <- (4-1)/5
xleft <- seq(1, 4, deltax)
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 0.5*xleft[-length(xleft)]^2 + 2, border = "chocolate", col = "chocolate2")
rect(xleft = xleft[-length(xleft)], ybottom = rep(par("usr")[3], length(xleft)), xright = xleft[-1], ytop = 0.5*xleft[-1]^2 + 2, border = "chocolate")
segments(1, par("usr")[3], 4, par("usr")[3], col = 4, lwd = 2)
segments(x0 = c(1, 4), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(1, 4), y1 = c(0.5*1^2 + 2, 0.5*4^2 + 2), col = 4, lwd = 2)

Now we can make these computations

$L_5 = $ 14.34 $R_5 = $ 18.84 $A_5 = $ 16.59

Error bound for \(L_5\) and \(R_5\) would be similar, that is

\[\text{Error} \leqslant |h(4) - h(1)| \Delta{x} = |10-2.5|(0.6) = 4.5\]

Error bound for \(A_5\) would be:

\[\text{Error} \leqslant \frac{4.5}{2} = 2.25\]

Last part of our challenge requires us to determine \(n\) for any \(L_n\) and \(R_n\) such that Error \(\leqslant 0.05\) (5%). In this regard what we seek is total area of uncolored region to less than or equal to 0.05.

\[|h(b) - h(a)| \frac{b - a}{n} \leqslant 0.05\]

In putting information we know we get:

\[|10 - 2.5|*\frac{4-1}{n} \leqslant 0.05\]

\[7.5*\frac{3}{n} \leqslant 0.05 \]

We can solve for n as

\[22.5 \leqslant 0.05n \qquad{} \therefore n \geqslant 450\]

Area under a curve for a graph crossing x-axis

Graphs crossing x-axis have negative heights for intervals below x-axis, this means these intervals will have a negative area. To get total area for region below and above x-axis, their areas are summed up but after region below x-axis is multiplied by a negative to convert them to positive.

As an example, let us look at

\[\int_{1}^{3} (x^2 - 4) \space{} dx \]

for \(n = 5\) and domain \(0 \leqslant x \leqslant 6\).

Graph below shows two colored regions (A and B) for which we seek to get their area.

x <- seq(0, 6, 0.0001)
jan <- function(z) z^2 - 4
plot(c(0, 5), c(-5, 15), type = "n", xaxt = "n", ann = FALSE)
axis(1, at = c(1, 3), labels = c(1, 3))
abline(h = 0, lty = 2)
lines(x, jan(x), col = 4, lwd = 2)
deltax <- (3-1)/5
xleft <- sort(c(seq(1, 3, deltax), 2))
# Rectangles below x-axis
rect(xleft = xleft[1:3], ybottom = jan(xleft[1:3]), xright = xleft[2:4], ytop = c(0, 0, 0), border = "chocolate")
rect(xleft = xleft[2:4], ybottom = jan(xleft[2:4]), xright = xleft[1:3], ytop = c(0, 0, 0), border = "chocolate", col = "chocolate2")
# Rectangles above x-axis
rect(xleft = xleft[4:6], ybottom = c(0, 0, 0), xright = xleft[5:7], ytop = jan(xleft[4:6]), border = "chocolate", col = "chocolate2")
rect(xleft = xleft[5:7], ybottom = c(0, 0, 0), xright = xleft[4:6], ytop = jan(xleft[5:7]), border = "chocolate")
segments(x0 = c(1, 1.8), c(0, 0), c(1.8, 3), c(0, 0), col = 4, lwd = 2)
segments(x0 = c(1, 3), y0 = c(jan(1), 0), x1 = c(1, 3), y1 = c(0, jan(3)), col = 4, lwd = 2)
text(c(1.2, 2.8), c(-1.4, 1.9), labels = c("A", "B"))

We begin by computing change in x or width of each rectangle

$ = $ 0.4

Now let us get area under graph above x-axis (positive area)

$L_{5^+} = $ 1.44 $R_{5^+} = $ 3.44 $A_{5^+} = $ 2.44

For area below graph we get

$L_{5^-} = $ -2.32 $R_{5^-} = $ -1.12 $A_{5^-} = $ -1.72

As mentioned, this area is negative given its negative height. Therefore to compute total area under graph of \(x^2 -4\) from x = 1 to x = 3, we multiply area under x-axis with a negative sign and add it to area above x-axis.

\[\int_{1}^{3} j(x) \space{} dx = -A + B = -(-1.72) + 2.44 = 4.16\]

Error bound and estimation of \(n\) are done as we did for graphs that are all positive or negative.

Area under a curve for a non-monotonic graph

For positive or negative non-monotonic graphs (are increasing and decreasing), approximations from \(L_n\) and \(R_n\) would not be accurate as area under graph may not be between these two values. For this reason error estimated from such approximations would also not be accurate.

In such case then, it would be good to subdivide graph into intervals such that each interval is monotonic (it is increasing or decreasing). This is where a bit of knowledge on derivatives comes in hand.

A.13.3.2.2 Determining Distance

Central to calculus is reversing or undoing derivatives, and in this case we want to determine distance given average rate. Recall in our introduction to derivatives, we mentioned that total distance covered at any one point in time can be obtained using instantaneous rate of change, given by difference quotient:

\[\frac{dp}{dt} = \lim_{\Delta{t} \to 0}\frac{p(t - \Delta{t}) - p(t)}{\Delta{t}}\]

Where \(p\) is position of a moving object at time \(t\) and average rate is measured as:

\[\text{Average rate }= \frac{\text{Total distance}}{\text{Elapsed time}}\]

From this equation we can make distance our subject such that

\[\text{Total distance } = \frac{\text{Average rate}}{\text{Elapsed time}}\]

Using this formula, we want revisit our example on jogging rate and and distance. This time we want to establish distance covered between two intervals.

Now, suppose our jogger has been jogging for about 5 hours at an average speed of 8 kilometers per hour and we want to know distance covered between her first and fourth hour, we can use our knowledge on definite integral to establish this.

Joggers average rate of jogging can be expressed as

\[j(t) = 8t\] where \(j(t)\) is rate in hours at the end of \(t\) hours. This is an creasing function which as we saw earlier, area under a graph is underestimated by \(L_n\) but overestimated by \(R_n\). Therefore to establish \(\int_{1}^{4} j(t) \space{} dx\), we need to compute area as average between \(L_n\) and \(R_n\) and compute its error bound.

In this regard, for \(n = 5\) we can get these estimates.

hg <- function(t) 8*t
interval <- c(1, 4)
n <- 5
delta_t <- diff(interval)/n 
t <- seq(interval[1], interval[2], by = delta_t)
L6 <- hg(t[1]) * delta_t + hg(t[2]) * delta_t + hg(t[3]) * delta_t + hg(t[4]) * delta_t + hg(t[5]) * delta_t
R6 <- hg(t[2]) * delta_t + hg(t[3]) * delta_t + hg(t[4]) * delta_t + hg(t[5]) * delta_t + hg(t[6]) * delta_t
A6 <- (L6 + R6)/2
height <- hg(interval[2]) - hg(interval[1])
Error_A6 <- abs(height) * (diff(interval)/(2 * n))
allowed_error <- 2
n_Error <- ((abs(height) * diff(interval))/2)/allowed_error

Width of each rectangle will be:

\[\Delta{t} = \frac{b-a}{n} = \frac{4 - 1}{5} = 0.6\]

\[L_{5} = j(1)\Delta{t} + j(1.6)\Delta{t} + j(2.2)\Delta{t} + j(2.8)\Delta{t} + j(3.4)\Delta{t} = 52.8\]

\[R_5 = j(1.6)\Delta{t} + j(2.2)\Delta{t} + j(2.8)\Delta{t} + j(3.4)\Delta{t} + j(4)\Delta{t}= 67.2\]

\[A_5 = \frac{L_5 + R_5}{2} = \frac{52.8 + 67.2}{2} = 60\text{km}\]

Approximated area covered by our jogger between 1 and 4 hours is 60km.

Error bound for this approximation is given by:

\[\text{Error} \leqslant |j(4) - j(1)|\frac{4-1}{2.5} = 7.2\text{km}\]

This error bound tells us our jogger could have covered a distance between 52.8Km and 67.2Km. We can represent this as:

\[\text{Distance traveled from t = 1 to t = 4} = \int_{1}^{4} j(t)\space{} dt = 60 \pm 7.2\text{km}\]

Suppose we wanted to get an estimated area with about 2km error, we would need to solve this in equality

\[|j(4) - j(1)| \frac{4-1}{2n} \leqslant 2\]

This gives us \(n \geqslant 18\text{km}\)

A.13.3.2.3 Total change

In most practical situations and more so those we will deal with in applied statistics requires estimating total rate of change. For example, for our jogger, suppose we know her speed increases up to a certain point and then decreases until she completes, one thing we might be interested in is total area covered during her decreasing interval. To determine interval where her speed is decreasing needs us to take derivative of her jogging function. Since we are assuming a real life situation, we take it that we do not have a function, instead we have a table showing instantaneous rate of increase and decrease (\(j(t)\)) for her 8km jog.

jogger <- matrix(c(15, 20, 25, 30, 25, 20, 15, 10, 5), nrow = 1, dimnames = list("j'(t)", 0:8))
jogger
##        0  1  2  3  4  5  6  7 8
## j'(t) 15 20 25 30 25 20 15 10 5

This table tells us that at start of our joggers jogging session with no distance covered she will be increasing by 15km/h, for every 1 kilometer covered she would have increased by 20km/h, for every 2 kilometers covered she would have increased by 25km/h, for every 3 kilometers covered she would have increased by 30km/h.

From 4 kilometer to 8 kilometer she would increasing at a decreasing rate, that is, for every 4 kilometers she would have increased by only 25km/h, for every 5 kilometers she would increase by only 20km/h, for every 6 kilometers she would have increased by 15km/h, for every 7 kilometers she would have increased by only 10km/h, and for every 8 kilometers she would have increased by only 5km/h.

Given this data, we are interested in total distance covered when on a decreasing rate, this is between \(t = 3\text { and } t = 8\).

jog <- function(t) -0.5*t^2 + 3.5*t + 15
t <- seq(0, 8, 0.001)
n_t <- length(t)
jogged <- jog(t)
pt_x <- c(seq(0, 3.5, length.out = 4), seq(3.5, 8, length.out = 6))
pt_y <- jog(pt_x)
plot(c(0.3, 8.4), c(0, 23), type = "n", xaxt = "n", yaxt = "n", xlab = "Distance (km)", ylab = "Rate of jogging")
axis(1, pt_x[-4], labels = 0:(length(pt_x)-2))
axis(2, seq(1, 21, 2))
rect(xleft = t[which(t == 3.5):(n_t-1)], ybottom = rep(par("usr")[3]+0.2, length(which(t == 3.5):(n_t-1))), xright = t[t > 3.5], ytop = jog(t[t > 3.5]), border = "chocolate", col = "chocolate2")

lines(t, jog(t), col = 4, lwd = 2)
segments(x0 = c(3.5, 8), y0 = c(par("usr")[3], par("usr")[3]), x1 = c(3.5, 8), y1 = jog(c(3.5, 8)), col = 4, lwd = 2, lty = 2)
points(pt_x, pt_y, pch = 21, bg = 4, xpd = TRUE)

interval <- c(3, 8)
n <- 4
jogger <- as.vector(jogger)
delta_t <- diff(interval)/n
L4 <- jogger[4]*delta_t + jogger[5]*delta_t + jogger[6]*delta_t + jogger[7]*delta_t + jogger[8]*delta_t
R4 <- jogger[5]*delta_t + jogger[6]*delta_t + jogger[7]*delta_t + jogger[8]*delta_t + jogger[9]*delta_t
A4 <- (L4 + R4)/2
Error_A4 <- abs(jogger[9] - jogger[4])*(diff(interval)/(2*n))

We will approximate this total distance covered with\(A_n\) which is an average of \(L_n\) and \(R_n\). We will let \(n\) be 4 since we are given five values of rate of increase from 4 to 8km. We will denote jogging rate of increase with \(j(t)\).

\(\Delta{t}\) which is length of each rectangle is 1.25 kilometers.

\[L_4 = 30*\Delta{t} + 25*\Delta{t} + 20*\Delta{t} + 15*\Delta{t} + 10*\Delta{t} = 125\text{km} \quad{} \text{Overestimated distance}\]

\[R_4 = 25*\Delta{t} + 20*\Delta{t} + 15*\Delta{t} + 10*\Delta{t} + 5*\Delta{t} = 93.75 \quad{} \text{Underestimated distance}\]

\[A_4 = \frac{L_4 + R_4}{2} = \frac{125+93.75}{2} = 109.375\text{km} \quad{} \text{Approximated (average) distance}\]

Estimated error of approximation for this average distance is given by

\[\text{Error} \leqslant |j'(8)-j'(3)|\frac{8-3}{2.n}=|5-30|\frac{5}{2.4}=15.625\text{km}\]

We can now conclude this by indicating that additional or total distance covered by our jogger while at a decreasing rate is

\[\int_{3}^{8} j'(t)\space{} dt = 109.5\text{km} \pm 15.6\text{km}\]

A.13.3.3 Fundamental Theorem of Calculus

In this section we will get into a bit of formal definition of definite integral although we will not go in greater depth as this chapter is only meant to serve as refresher rather than a full length introduction to Mathematics or Calculus.

For a continuous function \(j\) on a closed interval [a, b], we note:

  1. Closed interval [a, b] contain ordered values increasing from \(a\) to \(b\) such that \(a = x_0 < x_1 < .... < x_{n-1} < x_n = b\) where \(n\) is number of sub-intervals.
  2. Length of each subinterval is given by \(\Delta{x_k} = x_k - x_{k-1}\) Where \(k\) is number of subinterval such that \(k = 1, 2, 3, ..., n\)
  3. When \(n\) becomes really high approaching infinity, length of each subinterval tends to 0, that is, \(\Delta{x_k} \to 0 \quad{}\text{as}\quad{} n \to \infty\).
  4. Finally we select one point between the \(n\) sub-intervals for which we denote as \(c_k\). We represent this as \(x_{k-1} \leqslant c_k \leqslant x_k\)

With that we can formally define definite integral as

\[\int_{a}^{b} j(x)\space{}dx = \lim_{n \to \infty} \sum^{n}_{k=1} j(c_k) \Delta{x_k}\]

We call this a definite integral of \(j\) from \(a \text{ to } b\). Here integrand is \(j(x)\) and lower limit is \(a\) while upper limit is \(b\).

It would be good to also note other than \(L_n\) and \(R_n\), we can compute area under a graph using midpoint sum \(M_n\) which follows similar computation.

Just like indefinite integral properties, definite integrals have handy properties which can be used to compute value of an integral.

Properties of definite integrals

  1. Zero: When lower and upper limits are similar, that is \(a = b\), then area is 0; \(\int_{a}^{b} j(x)\space{}dx = 0\)
  2. Reverse limits: When upper and lower limits are revered such that \(b < a\), then area is negative; \(\int_{a}^{b}j(x)\space{}dx = -\int_{b}^{a}j(x)\space{dx}\)
  3. Constant multiple: Like indefinite integral, definite integral of a constant and a function is equal to constant times integral of function; \(\int_{a}^{b}kj(x)\space{}dx = k \int_{a}^{b}j(x)\space{}dx,\) \(k\) is a constant
  4. Addition: Also like indefinite integrals, integral of two function added together is integral of each function; \(\int_{a}^{b}[j(x) \pm h(x)] dx = \int_{a}^{b}j(x)\space{}dx + \int_{a}^{b}h(x)\space{}dx\)
  5. Internal addition: If an integrand can be split into two, addition of those two integrals will be similar to original integral. This means for \(c\) between \(a \text{ and } b\); \(\int_{a}^{b}j(x)\space{}dx = \int_{a}^{c}j(x)\space{}dx + \int_{c}^{b}j(x)\space{}dx\)

Examples

Graph of \(j\) below has 4 colored areas, we will use these areas to compute subsequent integrals.

jg <- function(x) -x^5 - x^4 + 14*x^3 + 6*x^2 - 45*x -3
vertices <- c(-3, -1.23303, 1, 2.43303)
x <- sort(c(seq(-4, 3.5, 0.0001), vertices))
plot(c(-5, 5), c(-40, 40), type = "n", ann = FALSE, xaxt = "n", yaxt = "n")
y <- jg(x)
y0 <- c(-3.49, -2.28, -0.07, 2.15, 2.66)
lims <- sort(c(vertices, y0))
axis(1, at = y0, labels = letters[1:length(y0)])
axis(2, at = 0)

# Area
areas <- sapply(1:(length(lims)-1), function(i) x[x >= lims[i] & x <= lims[i+1]])
n <- sapply(areas, length)

# Colouring area under graph and x-axis between given limits
sapply(c(1, 5), function(i) rect(xleft = areas[[i]][-n[i]], ybottom = 0, xright = areas[[i]][-i], ytop = jg(areas[[i]][-n[i]]), border = "lightblue", col = "lightblue"))
## [[1]]
## NULL
## 
## [[2]]
## NULL
sapply(c(2, 6), function(i) rect(xleft = areas[[i]][-n[i]], ybottom = 0, xright = areas[[i]][-1], ytop = jg(areas[[i]][-1]), border = "lightblue", col = "lightblue"))
## [[1]]
## NULL
## 
## [[2]]
## NULL
sapply(c(3, 7), function(i)  rect(xleft = areas[[i]][-n[i]], ybottom = 0, xright = areas[[i]][-1], ytop = jg(areas[[i]][-n[i]]), border = "chocolate2", col = "chocolate2"))
## [[1]]
## NULL
## 
## [[2]]
## NULL
sapply(c(4, 8), function(i) rect(xleft = areas[[i]][-n[i]], ybottom = 0, xright = areas[[i]][-1], ytop = jg(areas[[i]][-1]), border = "chocolate2", col = "chocolate2"))
## [[1]]
## NULL
## 
## [[2]]
## NULL

# Graph of j
lines(x, y, col = 4)
abline(h = 0)
text(y0, 0, labels = "|", font = 2) 
legend("topright", legend = "y = j(x)", bty = "n", text.font = 2)
text(vertices, c(-10, 17, -10, 10), labels = LETTERS[1:length(vertices)])

We are told area of these regions under graph of \(j\) are

  • Area of \(A\) is 3.21 #Actual values will be provided later
  • Area of \(B\) is 3.56
  • Area of \(C\) is 3.61
  • Area of \(D\) is 0.6

We are asked to compute these integrals:

  1. \(\int_{a}^{b} j(x)\space{}dx\)
  2. \(\int_{b}^{a} j(x)\space{}dx\)
  3. \(\int_{b}^{d} j(x)\space{}dx\)
  4. \(\int_{b}^{b} j(x)\space{}dx\)
  5. \(\int_{d}^{e} 4j(x)\space{}dx\)
  6. \(\int_{c}^{d} \frac{j(x)}{^-3}\space{}dx\)

Using properties of definite integrals we should arrive at

  1. $ = A = -3.21$
  2. $ = -A = 3.21$
  3. $ = B + C = 3.56 + (-3.61) = -0.05$
  4. $ = 0$
  5. $ = 4* D = 4 * 0.6 = 2.4$
  6. $= = 1.2 $

Relationship between definite and indefinite integral

In this section we want to look at difference and connection between definite and indefinite integral which will lead us to fundamental theorem of calculus.

Definite integrals differ from indefinite integral in that definite integrals outputs a real number while indefinite integral outputs a set of functions. We can also view definite integral as a geometric notion while indefinite integral as an algebraic notion. Finally on difference, these integrals apply to different sets of functions.

Core connection between definite integral and indefinite integral is that for a continuous derivative function on a closed interval, its area can also be obtained from difference of its antiderivative evaluated at end point of its interval. This means for function \(m\) a continuous function on closed interval \([j, h]\) with its antiderivative \(M\), this holds

\[\int_{j}^{h} m(x)\space{}dx = M(x)|_{j}^{h} = M(h) - M(j)\]

where \(M'(x) = m(x)\) and \(M(x)|_{j}^{h}\) represents total change in \(M(x)\) from \(x = j\) and \(x = h\)

As an example, let us evaluate this definite integral

\[\int_{1}^{5} (2e^x + 6x^2 + \frac{2}{x} - 3)\space{}dx\]

Property four of definite integral leads us

\[2\int_{1}^{5}e^x\space{}dx + 6\int_{1}^{5}x^2\space{}dx + 2\int_{1}^{5}\frac{1}{x}\space{}dx - \int_{1}^{5}3\space{}dx\]

From fundamental theorem of calculus, we can get antiderivatives and evaluate them at limits.

\[2*e^x|_{1}^{5} + 6*\frac{x^3}{3}|_{1}^{5} + 2*ln|x||_{1}^{5} - 3*x|_{1}^{5}\]

a <- (2*exp(5) - 2*exp(1)) + (6*(5^3/3) - 6*(1^3/3)) + (2*log(5) - 2*log(1)) + (3*5 - 3*1)

Which we should get

$(2e^5 - 2e^{1}) + (6 - 6) + (2ln(5) - 2ln(1)) - (35 - 31) $ 554.6

References

[1.]{#1} Barnett, Raymond A. Applied calculus for business, economics, life sciences, and social sciences Prentice-Hall Internations 7th ed [2.]{#2} https://en.wikipedia.org/wiki/Set_theory [3.]{#3} www.mathsisfun.com [4.]{#4} https://www.khanacademy.org/math/algebra/quadratics/solving-quadratics-by-completing-the-square/a/solving-quadratic-equations-by-completing-the-square