Menu

9. Artificial Intelligence, Data Science, and Internet of Things (ACtE09)

Computer Engineering (Nepal Engineering Council) - Engineering Licence Exam

This chapter provides a comprehensive overview of Artificial Intelligence (AI) and its core concepts, from intelligent agents and problem-solving techniques to knowledge representation, expert systems, and natural language processing. It delves into the principles of machine learning, covering supervised, unsupervised, and reinforcement learning, and explores the architecture and algorithms of neural networks, including deep learning fundamentals.

No MCQ questions available for this chapter.

9. Artificial Intelligence, Data Science, and Internet of Things (ACtE09)

9.1 Introduction to AI and Intelligent Agent

Artificial Intelligence (AI) is a broad field of computer science dedicated to creating machines that can perform tasks that typically require human intelligence. It encompasses various sub-disciplines aimed at simulating cognitive functions like learning, problem-solving, and pattern recognition.

Concept of Artificial Intelligence

  • Strong AI vs. Weak AI:
    • Strong AI (or AGI - Artificial General Intelligence) aims to create machines with human-like cognitive abilities, capable of performing any intellectual task that a human can. It implies consciousness and sentience.
    • Weak AI (or Narrow AI) focuses on creating systems that can perform specific tasks intelligently. Most of today's AI systems, like virtual assistants or recommendation engines, fall into this category.
  • Turing Test: Proposed by Alan Turing, this test assesses a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. If a human interrogator cannot reliably distinguish between a machine and a human through textual conversation, the machine passes the test.

AI Perspectives

AI can be categorized based on how it approaches intelligence:

  • Thinking Humanly: Cognitive modeling approach, trying to understand how humans think and then replicate it.
  • Acting Humanly: Turing Test approach, focusing on observable behavior that is indistinguishable from human behavior.
  • Thinking Rationally: Laws of thought approach, using logic to build intelligent systems.
  • Acting Rationally: Rational agent approach, focusing on actions that achieve the best outcome given the available information. This is the most widely adopted approach in modern AI.

History of AI

AI has a rich history marked by periods of rapid progress and "AI winters." Key milestones include:

  • 1940s-1950s: Foundations laid by pioneers like Alan Turing (Turing Test), John von Neumann (computer architecture), and Norbert Wiener (cybernetics).
  • 1956: Dartmouth Workshop, where the term "Artificial Intelligence" was coined by John McCarthy, marking the birth of AI as a field.
  • 1960s-1970s: Early expert systems (DENDRAL, MYCIN), symbolic reasoning, general problem solver (GPS).
  • 1980s: Expert systems boom, followed by an "AI winter" due to limitations and high costs.
  • 1990s: Rise of machine learning, data mining, and intelligent agents. Deep Blue defeats Garry Kasparov in chess.
  • 2000s-Present: Big data, deep learning revolution, widespread applications in NLP, computer vision, robotics, and autonomous systems.

Applications of AI

AI is pervasive across industries:

  • Expert Systems: Mimic human experts in specific domains (e.g., medical diagnosis, financial planning).
  • Natural Language Processing (NLP): Enables computers to understand, interpret, and generate human language (e.g., chatbots, translation, sentiment analysis).
  • Robotics: Design and development of robots capable of performing tasks autonomously or semi-autonomously (e.g., industrial robots, autonomous vehicles).
  • Machine Vision (Computer Vision): Allows computers to "see" and interpret visual information from images or videos (e.g., facial recognition, object detection).
  • Game Playing: AI agents competing in complex games (e.g., AlphaGo, chess engines).

Foundations of AI

AI draws from diverse fields:

  • Computer Science: Algorithms, data structures, programming.
  • Mathematics: Logic, probability, statistics, optimization.
  • Psychology: Understanding human cognition and behavior.
  • Linguistics: Structure and meaning of language.
  • Philosophy: Mind, knowledge, reasoning, ethics.

Introduction to Agents

An agent is anything that can perceive its environment through sensors and act upon that environment through actuators. An intelligent agent is one that acts rationally, meaning it takes the best possible action in a given situation to achieve its goals. An agent can be seen as a function that maps percept sequences to actions:

f: Percept Sequence -> Action

Structure of Intelligent Agent

  • Agent Program: The internal implementation of the agent function. It takes current percepts and potentially past percepts to decide on an action.
  • Architecture: The hardware and software platform on which the agent program runs (e.g., a robot body, a server running software).

Properties of Intelligent Agents

  • Autonomy: Ability to operate without constant human supervision, learning from experience.
  • Reactivity: Ability to respond to changes in the environment in a timely manner.
  • Proactiveness: Ability to take initiative and pursue goals, not just react to stimuli.
  • Social Ability: Ability to interact with other agents or humans (e.g., via communication, cooperation).

PEAS Description of Agents

A PEAS description defines the characteristics of an agent's task environment:

  • Performance: The criteria for measuring the success of the agent (e.g., points scored, efficiency).
  • Environment: The world in which the agent operates (e.g., a dusty room, a hospital).
  • Actuators: The means by which the agent acts on the environment (e.g., wheels, robotic arm, display screen).
  • Sensors: The means by which the agent perceives the environment (e.g., camera, GPS, keyboard input).

Example: Robotic Vacuum Cleaner

  • Performance: Maximize dirt collected, minimize power consumption, minimize cleaning time, avoid obstacles.
  • Environment: House, furniture, dirt, obstacles.
  • Actuators: Wheels, brush, vacuum cleaner.
  • Sensors: Dirt detector, bump sensor, battery sensor, infrared wall sensor.

Types of Agents

  • Simple Reflex Agents: Act based on the current percept, ignoring percept history. They use a condition-action rule set.
    function SIMPLE-REFLEX-AGENT(percept) returns an action
        state <- INTERPRET-INPUT(percept)
        rule <- RULE-MATCH(state, rules)
        action <- rule.action
        return action
  • Model-Based Reflex Agents: Maintain an internal state (a "model" of the world) to keep track of unobservable aspects of the environment, then use condition-action rules.
    function MODEL-BASED-REFLEX-AGENT(percept) returns an action
        persistent: state, a description of the current world state
                model, a description of how the world evolves and how agent's actions affect it
                rules, a set of condition-action rules
                action, the most recent action, initially none
        state <- UPDATE-STATE(state, action, percept, model)
        rule <- RULE-MATCH(state, rules)
        action <- rule.action
        return action
  • Goal-Based Agents: Use a model of the world and a set of goals to choose actions that lead to the desired future states. They consider the future consequences of actions.
  • Utility-Based Agents: Similar to goal-based agents but use a utility function to measure how "good" a state is, allowing them to choose among multiple goals or actions with varying degrees of success.

Environment Types

Environments can be characterized by several properties:

  • Deterministic vs. Stochastic:
    • Deterministic: The next state is completely determined by the current state and the action executed.
    • Stochastic: The next state is not entirely determined by the current state and action; there is an element of randomness.
  • Static vs. Dynamic:
    • Static: The environment does not change while the agent is deliberating.
    • Dynamic: The environment can change while the agent is deliberating, requiring continuous sensing.
  • Observable vs. Semi-observable (Partially Observable):
    • Observable: The agent's sensors give it access to the complete state of the environment at all times.
    • Semi-observable: The agent's sensors do not provide a complete picture of the environment state.
  • Discrete vs. Continuous:
    • Discrete: A limited number of distinct states and actions.
    • Continuous: States and actions can take on a range of values (e.g., continuous time, continuous space).
  • Single Agent vs. Multi Agent:
    • Single Agent: Only one agent operates in the environment.
    • Multi Agent: Multiple agents interact, potentially cooperatively or competitively.
  • Episodic vs. Sequential:
    • Episodic: Each action choice is independent of previous actions; past actions don't affect future decisions.
    • Sequential: Current decisions affect future decisions and outcomes.

9.2 Problem Solving and Searching Techniques

Many AI problems can be modeled as searching for a solution in a state space. This involves defining the problem, formulating it as a search problem, and then employing search algorithms to find a path from an initial state to a goal state.

Problem as a State Space Search

A search problem is defined by:

  • State: A complete description of the world.
  • Initial State: The starting point of the search.
  • Actions (Successor Function): A set of possible actions that can be taken from a given state, leading to new states.
  • Transition Model: A description of what each action does, mapping a state and an action to a resulting state.
  • Goal Test: A way to determine if a given state is a goal state.
  • Path Cost: A function that assigns a numerical cost to each path (sequence of actions).

Problem Formulation (Vacuum World Example)

Consider a simple vacuum cleaner agent in a two-room environment (A and B), where each room can be clean or dirty.

  • States: (Location of agent, State of Room A, State of Room B). Example: (A, Dirty, Clean). There are 2 locations * 2 states for A * 2 states for B = 8 possible states.
  • Initial State: Any of the 8 states, e.g., (A, Dirty, Dirty).
  • Actions: Suck (cleans current room), Left (move to A), Right (move to B).
  • Transition Model:
    • If in A and Suck: If A was Dirty, it becomes Clean. Agent stays in A.
    • If in A and Right: Agent moves to B.
    • If in B and Left: Agent moves to A.
  • Goal Test: Is Room A Clean AND Room B Clean? (e.g., (A, Clean, Clean) or (B, Clean, Clean)).
  • Path Cost: Each action costs 1 unit.

Well-Defined Problems

Examples of classic well-defined problems in AI:

  • 8-Puzzle: A 3x3 grid with 8 numbered tiles and one blank space. The goal is to rearrange the tiles into a specific order by sliding them into the blank space.
  • 8-Queens Problem: Place 8 queens on an 8x8 chessboard such that no two queens attack each other (no two queens share the same row, column, or diagonal).
  • Traveling Salesperson Problem (TSP): Given a list of cities and the distances between each pair of cities, find the shortest possible route that visits each city exactly once and returns to the origin city.

Constraint Satisfaction Problem (CSP)

A CSP is a special type of problem where the goal is to find a state that satisfies a given set of constraints. It's defined by:

  • Variables: A set of variables X = {X1, ..., Xn}.
  • Domains: A set of possible values for each variable D = {D1, ..., Dn}.
  • Constraints: A set of constraints C = {C1, ..., Cm} that specify allowable combinations of values for subsets of variables.

Example: Map Coloring

  • Variables: States/Regions on a map (e.g., WA, NT, SA, Q, NSW, V, T).
  • Domains: {red, green, blue} for each variable.
  • Constraints: Adjacent regions must have different colors (e.g., WA ≠ NT, SA ≠ WA).

Uninformed Search (Blind Search)

These algorithms explore the state space without using any domain-specific knowledge about the goal's location. They only distinguish goal states from non-goal states.

  • Breadth-First Search (BFS):
    • Algorithm: Explores all nodes at the current depth level before moving to the next level. Uses a queue.
    • Completeness: Yes, if a solution exists and the branching factor is finite.
    • Optimality: Yes, if all step costs are equal (or non-decreasing with depth).
    • Time Complexity: O(b^d), where b is branching factor, d is depth of solution.
    • Space Complexity: O(b^d), as it stores all nodes at the current depth.
  • Depth-First Search (DFS):
    • Algorithm: Explores as far as possible down each branch before backtracking. Uses a stack.
    • Completeness: No, can get stuck in infinite paths. Yes, if graph is finite and no loops.
    • Optimality: No, may find a non-optimal solution first.
    • Time Complexity: O(b^m), where m is maximum depth of search space.
    • Space Complexity: O(bm), stores only the current path.
  • Depth-Limited Search (DLS):
    • Algorithm: DFS with a predefined depth limit L. Nodes beyond L are treated as if they have no successors.
    • Completeness: No, if the shallowest solution is beyond L.
    • Optimality: No.
    • Time Complexity: O(b^L).
    • Space Complexity: O(bL).
  • Iterative Deepening Search (IDS):
    • Algorithm: Repeatedly performs DLS, incrementing the depth limit L from 0 up to 1, 2, ... until a goal is found. Combines advantages of BFS (completeness, optimality) and DFS (space efficiency).
    • Completeness: Yes, if a solution exists.
    • Optimality: Yes, if step costs are equal.
    • Time Complexity: O(b^d) (similar to BFS, but with repeated work, still asymptotically optimal).
    • Space Complexity: O(bd) (similar to DFS).
  • Bidirectional Search:
    • Algorithm: Runs two simultaneous searches, one forward from the initial state and one backward from the goal state, until they meet in the middle. Requires knowing how to generate predecessors and a well-defined goal state.
    • Completeness: Yes, if both BFS searches are used.
    • Optimality: Yes, if both BFS searches are used and step costs are equal.
    • Time Complexity: O(b^(d/2)).
    • Space Complexity: O(b^(d/2)).

Informed Search (Heuristic Search)

These algorithms use problem-specific knowledge (heuristics) to guide the search towards the goal, making them more efficient than uninformed searches.

  • Greedy Best-First Search:
    • Algorithm: Expands the node that appears to be closest to the goal, as estimated by a heuristic function h(n). Uses a priority queue ordered by h(n).
    • Completeness: No, can get stuck in infinite paths or local minima.
    • Optimality: No.
    • Time Complexity: O(b^m) in worst case.
    • Space Complexity: O(b^m) in worst case.
  • A* Search:
    • Algorithm: Combines uniform-cost search (which is optimal) with greedy best-first search (which is efficient). It expands the node n with the lowest value of f(n).
    • Evaluation Function: f(n) = g(n) + h(n)
      • g(n): The cost from the initial state to node n.
      • h(n): The estimated cost from node n to the goal state (heuristic function).
    • Admissible Heuristics: A heuristic h(n) is admissible if it never overestimates the true cost to reach the goal (i.e., h(n) <= h*(n), where h*(n) is the true cost). Admissible heuristics ensure optimality for A*.
    • Completeness: Yes, if h(n) is admissible, branching factor is finite, and step costs are non-negative.
    • Optimality: Yes, if h(n) is admissible and consistent (monotonic).
    • Time Complexity: Exponential, but often much better than uninformed search depending on heuristic quality.
    • Space Complexity: Exponential, as it keeps all generated nodes in memory.

Local Search Algorithms

These algorithms operate on a single current state, moving to a neighbor state if it improves the objective function, rather than exploring paths from the initial state. They are good for optimization problems where only the final state matters, not the path.

  • Hill Climbing:
    • Algorithm: Iteratively moves from the current state to an adjacent state that offers a better solution, until no better neighbor is found (reaches a local optimum).
    • Simple Hill Climbing: Chooses the first neighbor that is better than the current state.
    • Stochastic Hill Climbing: Chooses randomly among the uphill moves, with probability of selection depending on the steepness of the ascent.
    • First-Choice Hill Climbing: Generates successors randomly until one is found that is better than the current state.
    • Random-Restart Hill Climbing: Conducts a series of hill-climbing searches from randomly generated initial states. If one fails to find a good solution, it restarts. This helps escape local optima.
    • Drawbacks: Prone to getting stuck in local maxima, ridges, and plateaus.
  • Simulated Annealing:
    • Algorithm: Inspired by the annealing process in metallurgy. It allows "bad" moves (moving to a worse state) with a certain probability, especially early in the search, to escape local optima. This probability decreases over time (controlled by a "temperature" parameter).
    • Acceptance Probability Formula: If a new state is worse than the current state by ΔE (where ΔE > 0), it is accepted with probability P = e^(-ΔE/T).
      • ΔE: The increase in cost (or decrease in quality) of the new state compared to the current state.
      • T: The "temperature" parameter, which decreases over time (cooling schedule).
    • As T decreases, the probability of accepting worse moves also decreases. If T is lowered slowly enough, simulated annealing can find a global optimum.

Game Playing (Adversarial Search)

Involves agents trying to maximize their own utility in an environment where other agents are also trying to maximize theirs.

  • Minimax Search:
    • Algorithm: A decision rule used in game theory for minimizing the possible loss for a worst-case (maximum loss) scenario. It assumes the opponent is also playing optimally.
    • Constructs a game tree representing possible moves and resulting states.
    • Max Player: Aims to maximize its utility value.
    • Min Player: Aims to minimize the Max player's utility value (i.e., maximize its own utility, which is the negative of Max's).
    • Works backward from the terminal states (leaf nodes) where utility values are known. At Min nodes, the minimum utility of children is chosen. At Max nodes, the maximum utility of children is chosen.
  • Alpha-Beta Pruning:
    • Algorithm: An optimization technique for Minimax search that eliminates branches of the search tree that cannot possibly influence the final decision. It dramatically reduces the number of nodes to evaluate.
    • Maintains two values:
      • Alpha (α): The best (highest) value found so far for the MAX player on the path to the current state.
      • Beta (β): The best (lowest) value found so far for the MIN player on the path to the current state.
    • Pruning Conditions:
      • If α >= β at any point, the current branch can be pruned. This means that if the MAX player's current best option (α) is already better than the MIN player's current worst option (β), then the MIN player will never choose this path, so further exploration is unnecessary.

9.3 Knowledge Representation

Knowledge representation (KR) is the field of AI dedicated to representing information about the world in a form that a computer system can use to solve complex tasks. It's crucial for enabling intelligent behavior by allowing systems to reason, learn, and understand.

Knowledge Representations and Mappings

KR involves defining a formal language or notation to describe facts about the world. This often includes:

  • Syntax: Rules for constructing valid sentences in the representation language.
  • Semantics: Rules for determining the meaning of sentences.
  • Inference Mechanism: Procedures for deriving new facts from existing ones.

Approaches to Knowledge Representation

  • Logical Representation: Uses formal logic (e.g., propositional logic, predicate logic) to represent knowledge as a set of logical sentences.
  • Semantic Network: Represents knowledge as a graph where nodes are concepts/objects and links are relationships.
  • Frames: Structured representation using "frames" (templates) with "slots" to describe typical objects or situations.
  • Rules (Production Rules): Uses IF-THEN rules to represent knowledge, often used in expert systems.

Issues in Knowledge Representation

  • Complexity: Representing complex, real-world knowledge can be challenging due to the vastness and intricacy of information.
  • Uncertainty: Dealing with incomplete, inconsistent, or probabilistic knowledge is difficult.
  • Granularity: Deciding the appropriate level of detail for representation.
  • Computational Efficiency: Ensuring that reasoning processes over the knowledge base are efficient.

Semantic Nets

A semantic network is a graphical representation where:

  • Nodes: Represent concepts, objects, or entities.
  • Labeled Links (Arcs): Represent relationships between nodes (e.g., "is-a", "has-part", "agent").
  • Inheritance: A key feature where properties of a superclass are inherited by its subclasses (e.g., "Bird is-a Animal" implies Bird inherits properties of Animal).

Example: (Bird) --is-a--> (Animal) (Bird) --can--> (Fly) (Tweety) --is-a--> (Bird)

Frames

Frames provide a structured way to represent stereotypical situations or objects. Each frame has:

  • Frame Name: Identifies the concept being described (e.g., "Car").
  • Slots: Attributes or properties of the concept (e.g., "Number_of_Wheels", "Color", "Engine_Type").
  • Fillers: Specific values for the slots (e.g., "4", "Red", "Internal Combustion").
  • Defaults: Typical values for slots that can be overridden (e.g., "Number_of_Wheels: default 4").
  • Procedures (Facets): Attached procedures that can be executed when a slot is accessed or modified (e.g., "if-needed" to compute a value).
Frame: Car
  Slots:
    Manufacturer: (value: Ford, Honda, ...)
    Model: (value: Mustang, Civic, ...)
    Color: (default: Red)
    Number_of_Wheels: (value: 4)
    Engine_Type: (value: Internal Combustion, Electric)
    Owner: (if-needed: find_owner_from_registration)

Propositional Logic (PL)

A basic form of logic where statements (propositions) are treated as atomic units that are either true or false.

  • Syntax:
    • Atomic propositions (e.g., P, Q, R)
    • Logical connectives:
      • AND (∧, conjunction)
      • OR (∨, disjunction)
      • NOT (¬, negation)
      • IMPLIES (→, conditional)
      • BICONDITIONAL (↔, equivalence)
  • Semantics: Defines the truth value of complex sentences based on the truth values of atomic propositions and the meaning of connectives.
  • Well-Formed Formula (WFF): A syntactically correct expression in PL.
  • Tautology: A WFF that is always true, regardless of the truth values of its atomic propositions (e.g., P ∨ ¬P).
  • Validity: A WFF is valid if it is a tautology. An argument is valid if its conclusion is true whenever its premises are true.

Inference using Resolution (Resolution Refutation)

Resolution is a powerful inference rule used to prove the satisfiability or unsatisfiability of a set of clauses in conjunctive normal form (CNF). Resolution refutation attempts to prove a statement by showing that its negation leads to a contradiction (falsehood).

  • Algorithm:
    1. Convert all knowledge base sentences to CNF.
    2. Negate the query (the statement to be proven) and convert it to CNF. Add it to the knowledge base.
    3. Repeatedly apply the resolution rule: If there are two clauses (A ∨ B) and (¬B ∨ C), infer (A ∨ C). The literal B and ¬B are "resolved away".
    4. If the empty clause ([], representing False) is derived, then the original query is proven true (because its negation led to a contradiction).

Predicate Logic (First-Order Predicate Logic - FOPL)

An extension of propositional logic that allows for more expressive representations by including predicates, variables, and quantifiers.

  • Syntax:
    • Predicates: Relations between objects (e.g., Is_King(John), Brother(Richard, John)).
    • Variables: Symbols that can stand for any object (e.g., x, y).
    • Constants: Specific objects (e.g., John, Richard).
    • Functions: Map objects to other objects (e.g., father_of(John)).
    • Quantification:
      • Universal Quantifier (∀): "For all" or "for every" (e.g., ∀x. Is_King(x) → Has_Crown(x)).
      • Existential Quantifier (∃): "There exists" or "for some" (e.g., ∃x. Is_King(x)).
    • Logical connectives (same as PL).
  • Semantics: Defines the truth value of FOPL sentences based on interpretations that assign meanings to constants, predicates, and functions.

Rules of Inference

Mechanisms for deriving new logical conclusions from existing premises.

  • Modus Ponens: If (P → Q) and P are true, then Q is true.
  • Modus Tollens: If (P → Q) and ¬Q are true, then ¬P is true.
  • Resolution: As described above, for clauses in CNF.

Unification Algorithm

A key algorithm in FOPL inference. It finds substitutions for variables that make two logical expressions identical. For example, to unify P(x, A) and P(B, y), the substitution {x/B, y/A} makes both expressions P(B, A).

Resolution Refutation System

The full resolution refutation system for FOPL involves:

  1. Converting all FOPL sentences to Conjunctive Normal Form (CNF) by a process called Skolemization (removing existential quantifiers by introducing Skolem constants/functions) and standardizing variables.
  2. Applying the generalized resolution rule, which combines unification with the propositional resolution rule.
  3. If the empty clause is derived, the query's negation is unsatisfiable, thus the query is true.

Bayes' Rule

A fundamental theorem in probability theory that describes how to update the probability of a hypothesis based on new evidence. It is crucial for reasoning under uncertainty.

P(H|E) = P(E|H) * P(H) / P(E)
  • P(H|E): Posterior probability – the probability of hypothesis H given evidence E.
  • P(E|H): Likelihood – the probability of evidence E given hypothesis H.
  • P(H): Prior probability – the initial probability of hypothesis H before observing any evidence.
  • P(E): Marginal likelihood – the probability of evidence E.

Example: Medical Diagnosis If H is "patient has meningitis" and E is "patient has stiff neck". P(meningitis | stiff neck) = P(stiff neck | meningitis) * P(meningitis) / P(stiff neck)

Bayesian Networks (Belief Networks)

A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).

  • Structure:
    • Nodes: Represent random variables (e.g., diseases, symptoms, events).
    • Directed Edges: Represent direct causal or influential relationships between variables. An edge from A to B means A directly influences B.
  • Conditional Probability Tables (CPTs): Each node has a CPT that quantifies the effect of its parents on the node.
  • Inference: Calculating the posterior probability of variables given observed evidence (e.g., P(Disease | Symptom)). This can be done through exact inference (e.g., variable elimination, clique tree propagation) or approximate inference (e.g., Monte Carlo methods).

Reasoning in Belief Networks

Bayesian networks allow for various types of probabilistic reasoning:

  • Diagnostic Reasoning (Effect to Cause): Inferring causes from observed effects (e.g., what disease given symptoms).
  • Causal Reasoning (Cause to Effect): Inferring effects from known causes (e.g., what symptoms are likely given a disease).
  • Intercausal Reasoning: Explaining away or explaining in phenomena where multiple causes interact.
  • Mixed Reasoning: A combination of the above.

9.4 Expert System and Natural Language Processing

This section explores two significant applications of AI: Expert Systems, which mimic human expertise, and Natural Language Processing, which enables computers to interact with human language.

Expert Systems

Definition: An expert system is a computer program that simulates the judgment and behavior of a human or an organization that has expert knowledge and experience in a particular domain. They are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as IF-THEN rules rather than through conventional procedural code.

Characteristics:

  • High performance in a specific domain.
  • Can handle symbolic reasoning.
  • Ability to explain its reasoning (transparency).
  • Can deal with uncertainty.
  • Easy to modify and expand knowledge.

Architecture of an Expert System

A typical expert system comprises several key components:

  • Knowledge Base: Contains the domain-specific knowledge, facts, and rules (IF-THEN statements) acquired from human experts.
  • Inference Engine: The "brain" of the expert system. It applies reasoning strategies (e.g., forward chaining, backward chaining) to the knowledge base to derive conclusions and solve problems.
    • Forward Chaining: Starts with known facts and applies rules to deduce new facts until a goal is reached or no more rules can be applied. (Data-driven)
    • Backward Chaining: Starts with a goal (hypothesis) and works backward to find facts that support it, asking for missing information as needed. (Goal-driven)
  • Working Memory (Blackboard): A temporary storage area for current problem facts, intermediate hypotheses, and conclusions.
  • Explanation Facility: Provides users with an explanation of how the system arrived at a conclusion or why it needs certain information. This enhances transparency and trust.
  • User Interface: Facilitates communication between the user and the expert system.
  • Knowledge Acquisition Subsystem (often external): Tools and methods for acquiring knowledge from human experts and encoding it into the knowledge base.

Diagram (conceptual):

+-----------------+      +---------------------+      +-----------------+
|   User Interface| <--->|  Inference Engine   |<---->|   Knowledge Base|
+-----------------+      +---------------------+      +-----------------+
        ^                        ^ ^                        ^
        |                        | |                        |
        +------------------------+ +------------------------+
        |                                                 |
        |             +-------------------------+         |
        |             | Explanation Facility    |<--------+
        |             +-------------------------+         |
        |                                                 |
        +-------------------------------------------------+
                      |
                      v
        +-------------------------+
        |   Working Memory        |
        +-------------------------+

Knowledge Acquisition

The process of extracting, structuring, and organizing knowledge from human experts or other sources for use in an expert system.

  • Manual Knowledge Acquisition: Involves knowledge engineers interviewing experts, observing their problem-solving, and transcribing their knowledge into formal representations. It's often time-consuming and prone to errors.
  • Automated Knowledge Acquisition: Uses machine learning techniques (e.g., rule induction, case-based reasoning) to extract knowledge from data or existing documents, reducing reliance on human experts.

Declarative vs. Procedural Knowledge

  • Declarative Knowledge: "What is" – Represents facts and statements about the world (e.g., "The sky is blue," "A dog is an animal"). Often stored in the knowledge base as facts or rules.
  • Procedural Knowledge: "How to" – Represents actions, steps, or processes for solving problems (e.g., "To open a door, turn the handle, then pull"). Often embedded in the inference engine's control strategies or in procedures attached to frames.

Development of Expert Systems (Life Cycle)

  1. Problem Identification and Assessment: Determine if the problem is suitable for an expert system.
  2. Knowledge Acquisition: Gather knowledge from experts.
  3. Knowledge Representation: Encode the acquired knowledge into a formal representation (e.g., rules, frames).
  4. System Design: Choose the inference engine and overall architecture.
  5. System Development (Prototyping): Build a prototype, test it, and refine the knowledge base and inference engine.
  6. Testing and Evaluation: Validate the system's performance against human experts.
  7. Deployment and Maintenance: Integrate the system into its operational environment and update its knowledge over time.

Natural Language Processing (NLP) Terminology

NLP enables computers to process and understand human language.

  • Tokenization: The process of breaking down a text into smaller units called tokens (words, punctuation marks, numbers).
  • Part-of-Speech (POS) Tagging: Assigning grammatical categories (e.g., noun, verb, adjective) to each word in a sentence.
  • Parsing (Syntactic Analysis): Analyzing the grammatical structure of a sentence to determine the relationships between words, often represented as a parse tree.
  • Stemming: Reducing words to their root form (e.g., "running" -> "run").
  • Lemmatization: Reducing words to their base or dictionary form (lemma), considering context (e.g., "better" -> "good").

Natural Language Understanding and Generation

  • Natural Language Understanding (NLU): The process of converting human language into a formal representation that a computer can process and act upon. It involves interpreting meaning, handling ambiguity, and extracting information.
  • Natural Language Generation (NLG): The process of generating human-readable text from structured data or a machine representation. It involves deciding what to say (content determination), how to structure it (sentence planning), and how to phrase it (lexicalization and realization).

Steps of NLP

  1. Morphological Analysis: Analyzing word structure (morphemes) and their formation.
  2. Syntactic Analysis (Parsing): Analyzing the grammatical structure of sentences to ensure they are well-formed and to identify relationships between words.
  3. Semantic Analysis: Extracting the meaning from words and sentences, considering the context.
  4. Pragmatic Analysis: Interpreting the overall meaning of text in context, considering factors like speaker intent, discourse structure, and real-world knowledge.

Applications of NLP

  • Machine Translation: Automatically translating text or speech from one language to another (e.g., Google Translate).
  • Text Summarization: Generating concise summaries of longer texts.
  • Chatbots and Virtual Assistants: AI systems that can converse with humans to answer questions or perform tasks (e.g., Siri, Alexa).
  • Sentiment Analysis: Determining the emotional tone or opinion expressed in a piece of text.
  • Information Extraction: Automatically extracting structured information from unstructured text.

NLP Challenges

  • Ambiguity: Words or sentences can have multiple meanings (e.g., "bank" - river bank vs. financial institution).
    • Lexical Ambiguity: A single word having multiple meanings.
    • Syntactic Ambiguity: A sentence having multiple grammatical interpretations.
    • Referential Ambiguity: Pronouns or noun phrases referring to multiple entities.
  • Context: Understanding the meaning often requires understanding the surrounding text and real-world knowledge.
  • Co-reference Resolution: Identifying when different expressions in a text refer to the same entity (e.g., "John bought a car. He drove it home.").
  • Idioms and Figurative Language: Expressions whose meaning cannot be deduced from the literal meaning of their words.

Machine Vision Concepts

Machine Vision (or Computer Vision) enables computers to interpret and understand visual information from the real world.

  • Image Processing: Techniques for manipulating and enhancing digital images (e.g., filtering, noise reduction, edge detection).
  • Feature Extraction: Identifying and isolating distinctive characteristics or patterns from an image (e.g., corners, edges, textures, shapes, color histograms) that are useful for analysis and recognition.

Machine Vision Stages

  1. Image Acquisition: Capturing an image using sensors (e.g., cameras, scanners).
  2. Image Preprocessing: Enhancing the image for subsequent processing, removing noise, correcting distortions, normalizing lighting.
  3. Segmentation: Dividing the image into meaningful regions or objects (e.g., isolating a car from its background).
  4. Feature Extraction: Computing descriptive features from the segmented regions.
  5. Recognition/Interpretation: Identifying and classifying objects or patterns based on their extracted features, often using machine learning models.

Robotics

The field of engineering and computer science that deals with the design, construction, operation, and application of robots.

  • Types of Robots:
    • Industrial Robots: Used in manufacturing for repetitive tasks (e.g., assembly, welding, painting). Typically fixed-base manipulators.
    • Mobile Robots: Capable of movement in an environment (e.g., autonomous vehicles, drones, vacuum cleaners).
    • Humanoid Robots: Designed to resemble and interact with humans (e.g., ASIMO, Sophia).
    • Collaborative Robots (Cobots): Designed to work safely alongside humans in shared workspaces.
    • Medical Robots: Used in surgery, rehabilitation, and drug delivery.

9.5 Machine Learning

Machine Learning (ML) is a subset of AI that enables systems to learn from data, identify patterns, and make decisions with minimal human intervention. It involves developing algorithms that can automatically improve through experience.

Introduction to Machine Learning

  • Supervised Learning:
    • Concept: The algorithm learns from labeled training data, where each input example is paired with an expected output. The goal is to learn a mapping from inputs to outputs.
    • Tasks: Classification (predicting discrete labels) and Regression (predicting continuous values).
    • Examples: Spam detection, image classification, house price prediction.
  • Unsupervised Learning:
    • Concept: The algorithm learns from unlabeled data, identifying hidden patterns or structures within the data without explicit guidance.
    • Tasks: Clustering (grouping similar data points) and Dimensionality Reduction (reducing the number of features while retaining important information).
    • Examples: Customer segmentation, anomaly detection, topic modeling.
  • Reinforcement Learning (RL):
    • Concept: An agent learns to make decisions by interacting with an environment, receiving rewards for desired actions and penalties for undesirable ones. The goal is to learn a policy that maximizes cumulative reward.
    • Components:
      • Reward: A numerical signal indicating the desirability of an action.
      • Policy: A strategy that maps states to actions.
      • Value Function: A prediction of future rewards.
    • Examples: Game playing (AlphaGo), robotics control, autonomous driving.

Concepts of Learning

  • Instance-Based Learning: Stores training examples and makes predictions based on their similarity to new instances (e.g., K-Nearest Neighbors). "Lazy learning."
  • Model-Based Learning: Builds an explicit model from training data, then uses this model to make predictions on new data (e.g., linear regression, decision trees, neural networks). "Eager learning."

Inductive Learning: Decision Tree

Decision trees are non-parametric supervised learning methods used for classification and regression. They partition the data into subsets based on feature values, forming a tree-like structure of decisions.

  • ID3 (Iterative Dichotomiser 3): An algorithm for constructing a decision tree by iteratively selecting the attribute that provides the highest Information Gain to split the data.
  • C4.5: An extension of ID3 that handles both continuous and discrete attributes, missing values, and pruning of the tree after creation.
  • Gini Impurity: A measure of impurity or disorder used in CART (Classification and Regression Trees) algorithms. A Gini impurity of 0 indicates a pure node (all samples belong to the same class).
    Gini(D) = 1 - Σ (p_i)^2
    Where D is the dataset, and p_i is the proportion of instances belonging to class i in the dataset.
  • Information Gain: The reduction in entropy (or uncertainty) achieved by splitting a dataset on a particular attribute. The attribute with the highest information gain is chosen as the splitting criterion.
    Information Gain(D, A) = Entropy(D) - Σ (|D_v| / |D|) * Entropy(D_v)
    Where D is the parent dataset, A is the attribute, D_v is the subset of D for which attribute A has value v, |D| is the number of instances in D, and Entropy(S) = -Σ p_i * log2(p_i).

Example: Deciding whether to play tennis A decision tree might split based on "Outlook" (Sunny, Overcast, Rain), then "Humidity", then "Windy". Each leaf node gives a "Play" or "Don't Play" decision.

Statistical-based Learning: Naive Bayes Model

A probabilistic classifier based on Bayes' theorem with a "naive" independence assumption between features. It's often used for text classification.

  • Bayes' Theorem: P(Class | Features) = P(Features | Class) * P(Class) / P(Features)
  • Independence Assumption: Assumes that the presence of a particular feature in a class is independent of the presence of any other feature, given the class.
    P(x1, ..., xn | C) = P(x1 | C) * P(x2 | C) * ... * P(xn | C)
  • Algorithm:
    1. Calculate prior probabilities for each class P(C).
    2. Calculate likelihoods for each feature given each class P(x_i | C).
    3. For a new data point, calculate P(C | x_1, ..., x_n) for each class and predict the class with the highest posterior probability.

Fuzzy Learning

Integrates fuzzy logic into machine learning to handle uncertainty and imprecision in data. Fuzzy logic allows for degrees of truth rather than absolute true/false.

  • Fuzzy Sets: Sets where elements can have degrees of membership, ranging from 0 (not a member) to 1 (full member).
    Example: A person can be "tall" to a degree of 0.8, not just entirely tall or not tall.
  • Membership Functions: Define the degree to which an element belongs to a fuzzy set. Common shapes include triangular, trapezoidal, and Gaussian.
    Formula (Triangular):
    μ_A(x) = max(0, min((x - a) / (b - a), (c - x) / (c - b)))
    Where a, b, c are parameters defining the triangle's shape.

Fuzzy Inference System (FIS)

A system that uses fuzzy logic to map inputs to outputs. It typically involves fuzzification, fuzzy rules, inference, and defuzzification.

  • Mamdani Method:
    • Generates fuzzy outputs (e.g., "output is high").
    • Requires defuzzification (e.g., centroid method) to convert fuzzy output into a crisp numerical value.
    • More intuitive and human-readable rules.
  • Sugeno Method (Takagi-Sugeno-Kang):
    • Generates crisp outputs directly or uses linear functions of the input variables in the consequent part of the rules.
    • More computationally efficient, often used in control systems.
    • No defuzzification step in the traditional sense.

Genetic Algorithm (GA)

A search heuristic inspired by the process of natural selection and evolution. It's used to find optimal or near-optimal solutions to optimization and search problems.

  • Algorithm Steps:
    1. Initialization: Create an initial population of candidate solutions (chromosomes/individuals).
    2. Fitness Evaluation: Assess the "fitness" of each individual using a fitness function.
    3. Selection: Select individuals from the current population to be parents for the next generation, typically favoring fitter individuals.
    4. Crossover (Recombination): Combine genetic material from two parents to create new offspring.
    5. Mutation: Randomly alter some genes in the offspring to introduce diversity.
    6. Replacement: Replace the old population with the new generation.
    7. Repeat from step 2 until a termination condition is met (e.g., max generations, satisfactory fitness).
  • Operators:
    • Selection: Determines which individuals from the current generation will be chosen to create offspring.
      • Roulette Wheel Selection: Probability of selection is proportional to fitness.
      • Tournament Selection: Randomly selects a subset of individuals and chooses the fittest one from that subset.
    • Crossover: Combines parts of two parent chromosomes to create two new offspring. Common types include one-point, two-point, and uniform crossover.
    • Mutation: Randomly changes one or more genes in a chromosome to maintain genetic diversity and prevent premature convergence.
  • Encoding: How candidate solutions are represented as chromosomes.
    • Binary Encoding: Solutions represented as binary strings (e.g., `010110`).
    • Real-Valued Encoding: Solutions represented as floating-point numbers.
  • Fitness Function: Quantifies the quality of each candidate solution. It maps a chromosome to a numerical score.
  • Parameters:
    • Population Size: Number of individuals in each generation.
    • Crossover Rate: Probability that two parents will perform crossover.
    • Mutation Rate: Probability that a gene will be mutated.

9.6 Neural Networks

Neural Networks (NNs) are computational models inspired by the structure and function of biological neural networks in the brain. They are powerful tools for pattern recognition, classification, and regression, forming the basis of deep learning.

Biological Neural Networks vs. Artificial Neural Networks (ANN)

  • Biological Neural Networks: Composed of neurons, dendrites, axons, and synapses. Information is transmitted via electrochemical signals. Highly parallel, adaptive, and fault-tolerant.
  • Artificial Neural Networks (ANN): Composed of interconnected artificial neurons (nodes) organized in layers. Information is transmitted as numerical values (activations) through weighted connections. They learn by adjusting these weights.

McCulloch-Pitts Neuron

The first mathematical model of an artificial neuron, proposed in 1943. It's a binary threshold unit.

  • Takes binary inputs (0 or 1).
  • Each input has a fixed weight.
  • Computes a weighted sum of inputs.
  • If the sum exceeds a threshold, the neuron fires (output 1); otherwise, it remains inactive (output 0).

  • Output = 1 if Σ (w_i * x_i) >= threshold else 0

Mathematical Model of ANN

A typical artificial neuron (or perceptron) calculates a weighted sum of its inputs, adds a bias, and then passes this sum through an activation function.

Net Input (z) = Σ (w_i * x_i) + b Output (a) = Activation_Function(z)
  • x_i: Input values from previous layer/features.
  • w_i: Weights associated with each input.
  • b: Bias term (an intercept that allows the activation function to be shifted).
  • Σ: Summation.
  • Activation_Function: A non-linear function applied to the net input.

Activation Functions with Formulas

Non-linear functions that introduce non-linearity into the network, allowing it to learn complex patterns.

  • Step Function (Binary Step):
    f(x) = 1 if x >= 0 else 0
    Used in McCulloch-Pitts neurons and early perceptrons. Not differentiable, so unsuitable for gradient-based learning.
  • Sigmoid Function (Logistic):
    f(x) = 1 / (1 + e^(-x))
    Squashes input values to a range between 0 and 1. Good for binary classification output layers. Suffers from vanishing gradients.
  • Tanh Function (Hyperbolic Tangent):
    f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
    Squashes input values to a range between -1 and 1. Zero-centered output, often preferred over sigmoid in hidden layers. Still suffers from vanishing gradients.
  • ReLU Function (Rectified Linear Unit):
    f(x) = max(0, x)
    Outputs the input directly if positive, otherwise outputs zero. Popular in hidden layers due to computational efficiency and mitigating vanishing gradient problem for positive inputs.
  • Softmax Function:
    f(x_i) = e^(x_i) / Σ (e^(x_j)) for all j in the output layer
    Used in the output layer of multi-class classification networks. Converts a vector of arbitrary real values into a probability distribution, where values sum to 1.

Architectures of Neural Networks

  • Feedforward Neural Networks (FNN):
    • Information flows in one direction, from input layer through hidden layers to the output layer, without loops or cycles.
    • Includes Single-Layer Perceptrons and Multilayer Perceptrons.
    • Suitable for tasks like classification and regression.
  • Recurrent Neural Networks (RNN):
    • Have connections that loop back, allowing information to persist across time steps.
    • Suitable for sequential data (e.g., time series, natural language) where context matters.
    • Examples: LSTMs (Long Short-Term Memory), GRUs (Gated Recurrent Units).
  • Convolutional Neural Networks (CNN):
    • Specifically designed for processing grid-like data, such as images.
    • Use convolutional layers to automatically learn spatial hierarchies of features.
    • Highly effective for computer vision tasks.

The Perceptron

The simplest type of feedforward neural network, capable of learning linearly separable patterns.

  • Single Layer: Consists of an input layer, a single output neuron, and no hidden layers.
  • Learning Rule (Perceptron Learning Rule): An iterative algorithm that adjusts weights based on the error between the predicted output and the target output.
    w_new = w_old + learning_rate * (target - output) * input
    b_new = b_old + learning_rate * (target - output)
    Where learning_rate is a small positive constant.
  • Convergence Theorem: If the data is linearly separable, the Perceptron learning algorithm is guaranteed to converge to a solution in a finite number of steps.

The Learning Rate

A hyperparameter that controls how much the weights of the network are adjusted with respect to the loss gradient during training.

  • Effect on Convergence:
    • High Learning Rate: Can lead to faster convergence initially but may overshoot the optimal solution, causing oscillations or divergence.
    • Low Learning Rate: Ensures smoother convergence but can make the training process very slow and may get stuck in local minima.

Gradient Descent

An iterative optimization algorithm used to find the minimum of a function (typically the loss function in ANNs) by repeatedly moving in the direction of the steepest descent (negative of the gradient).

  • Batch Gradient Descent:
    • Calculates the gradient of the loss function with respect to all training examples in the dataset (the "batch") before updating weights.
    • Provides a stable gradient estimate but can be slow for large datasets.
  • Stochastic Gradient Descent (SGD):
    • Calculates the gradient and updates weights for each individual training example.
    • Faster updates, but the gradient estimate is noisy, leading to oscillations. Can help escape local minima.
  • Mini-Batch Gradient Descent:
    • A compromise between Batch GD and SGD. Calculates the gradient and updates weights for a small subset (mini-batch) of training examples.
    • Offers a good balance between stability and computational efficiency. Most commonly used.

The