Skip to content

Digital Forensics and Cybercrime Investigation

Summary

This chapter introduces the principles and practices of digital evidence collection and analysis. Students learn the distinction between volatile (RAM) and non-volatile (HDD, SSD) digital storage, and why the order of evidence collection is determined by volatility. Write-blocker hardware prevents evidence contamination during forensic imaging, and bit-stream copies preserve an exact duplicate of the original media. Cryptographic hash functions (MD5 and SHA-256) verify that copies are forensically identical to originals. The chapter then covers EXIF metadata recovery, digital timestamp analysis, and geolocation extraction from image files — tools that form the foundation for Chapters 16–18. Network forensics (IP analysis, packet sniffing, firewall logs), encryption, steganography, and dark web evidence tracking round out the chapter. After completing this chapter, students will be able to verify the integrity of a forensic image using hash values and explain the evidentiary significance of EXIF metadata.

Concepts Covered

This chapter covers the following 21 concepts from the learning graph:

  1. Digital Forensics Overview
  2. Volatile vs Non-Volatile Data
  3. RAM Evidence Collection
  4. Hard Drive Architecture
  5. SSD Forensics
  6. Write-Blocker Hardware
  7. Forensic Imaging Process
  8. Bit-Stream Copy
  9. MD5 Hash Function
  10. SHA-256 Hash Function
  11. Hash Verification
  12. EXIF Metadata Recovery
  13. Digital Timestamp Analysis
  14. Geolocation from Metadata
  15. Network Forensics Basics
  16. IP Address Analysis
  17. Packet Sniffing
  18. Firewall Log Analysis
  19. Encryption in Digital Forensics
  20. Steganography
  21. Dark Web Evidence

Prerequisites

This chapter builds on concepts from:


Welcome, Investigators!

Trace waving welcome Every digital device is a potential witness. Smartphones remember where you were. Laptops store what you typed. Network logs record what you accessed and when. Digital forensics is the discipline of recovering, preserving, and analyzing that electronic testimony in a way that courts will accept. The rules are different from physical evidence — but they are just as strict. Follow the evidence — into the digital realm.


What Is Digital Forensics?

Digital forensics is the application of scientific methods to the recovery, preservation, analysis, and presentation of digital evidence in legal proceedings. Digital evidence can exist on any device that stores, processes, or transmits data — computers, smartphones, tablets, servers, surveillance cameras, vehicle infotainment systems, smart appliances, and network infrastructure.

The fundamental challenge of digital forensics is that digital evidence is uniquely fragile: it can be overwritten, deleted, encrypted, or destroyed with a single command — often without visible physical trace. The response to this fragility is a strict procedural framework that begins before a single file is examined.


The Volatility Hierarchy: Order of Collection

Not all digital data is equally permanent. The most critical principle guiding digital evidence collection is the order of volatility — the rule that evidence should be collected in order from most volatile (most easily lost) to least volatile (most permanent). Collecting non-volatile data first while ignoring volatile data is a common and irreversible mistake.

Volatile vs. Non-Volatile Data

Volatile data exists only while power is applied to the device. The moment a device is powered off, volatile data is lost permanently. The primary location of volatile data is RAM (Random Access Memory) — the device's active working memory.

Non-volatile data persists after the device is powered off. This includes data stored on hard disk drives (HDD), solid-state drives (SSD), USB drives, optical discs, and cloud storage.

The volatility hierarchy from most to least volatile: 1. CPU registers and cache 2. RAM contents (active processes, open files, network connections, encryption keys in use) 3. Network connections and routing tables 4. Running processes 5. Disk data (files, deleted files, system logs) 6. Remote/cloud storage

RAM Evidence Collection

RAM evidence collection (also called memory forensics or live acquisition) captures the contents of a device's active memory before shutdown. RAM may contain:

  • Encryption keys — disk encryption software holds decryption keys in RAM while the system is running; once the system is powered off, the keys are gone and encrypted drives become inaccessible
  • Active processes — what programs were running and their current state
  • Open network connections — active connections to remote servers, IP addresses being communicated with
  • Recent command history — commands typed in terminals or shells
  • Password hashes and session tokens from authenticated applications

Tools such as Volatility Framework or DumpIt create a memory dump — a forensic copy of RAM contents — that can be analyzed offline. RAM collection requires the device to remain powered on during acquisition.

Critical procedure point: If a device is running and suspected of containing active malware, running encryption, or connected to remote systems, collect RAM before powering down — then image the disk.


Disk Forensics: Hardware Architecture and Imaging

Hard Drive Architecture

A hard disk drive (HDD) stores data on magnetic platters that spin at high speed. Data is read and written by read/write heads that move across the platter surface. HDDs are organized into concentric tracks, subdivided into sectors (typically 512 bytes each), grouped into clusters (the minimum allocation unit for the filesystem). When a file is "deleted," the filesystem typically marks the cluster as available for reuse but does not overwrite the data — the content persists until overwritten by new data.

SSD Forensics

Solid-state drives (SSDs) use flash memory cells rather than magnetic platters. SSDs present unique forensic challenges compared to HDDs:

  • TRIM command: When a file is deleted on an SSD, the operating system may issue a TRIM command that instructs the drive to erase the flash blocks immediately — making deleted file recovery much harder or impossible
  • Wear leveling: SSDs distribute writes across all flash cells to equalize wear, meaning that recently written data may not be in the expected logical location
  • No overwrite guarantee: On HDDs, overwriting a sector is straightforward; on SSDs, the drive may write to a new physical location and leave the old data in a remapped sector

Forensic examiners must account for TRIM and wear leveling behavior when attempting to recover deleted data from SSDs.

Write-Blocker Hardware

Before connecting a suspect drive to an examination computer, a write-blocker is inserted between the drive and the forensic workstation. A write-blocker is a hardware device that allows data to flow from the drive to the computer (read operations) while physically preventing any data from flowing in the reverse direction (write operations).

Write-blockers are essential because: - Connecting a drive directly to a computer may cause the operating system to automatically mount the drive, write access timestamps, create swap files, or record other metadata — modifying the evidence - Any modification to the original drive, however small, can invalidate the chain of custody and render the evidence inadmissible - Hardware write-blockers are preferred over software write-blockers because hardware solutions are independent of the examination operating system

Forensic Imaging Process and Bit-Stream Copy

A forensic image is an exact, sector-by-sector copy of a storage device — including every allocated file, deleted file fragment, slack space (unused space within allocated clusters), and unallocated space. The process of creating a forensic image is called imaging or acquisition.

A bit-stream copy (also called a forensic clone) duplicates every single bit on the source drive in sequence, including: - Active files and directories - Deleted files not yet overwritten - File system metadata (creation, modification, and access timestamps) - Slack space (the unused portion at the end of the last cluster allocated to a file) - Unallocated space (previously used space now marked as free)

Tools such as dd (disk dump), FTK Imager, and Cellebrite UFED create bit-stream copies. The resulting image file is forensically equivalent to the original drive and can be analyzed without risk to the original evidence.

What Does the Data Tell Us?

Trace peering through magnifying glass A forensic image captures more than just the active files. Slack space between the end of a file and the end of its allocated cluster may contain fragments of previously stored data. Unallocated space may contain complete files that were "deleted" but never overwritten. Skilled examiners use file carving tools that scan unallocated space for known file headers (JPEG, PDF, DOCX) to reconstruct deleted files without relying on the file system's index — which may have already had its record overwritten.


Hash Verification: The Digital Chain of Custody

The digital equivalent of a chain of custody seal is a cryptographic hash value. A hash function takes any input (a file, a disk image, a string of text) and produces a fixed-length output — a digest — that is uniquely associated with that specific input. Any change to the input, no matter how small, produces a completely different digest.

MD5 Hash Function

MD5 (Message Digest 5) produces a 128-bit (32 hexadecimal character) digest. Example:

MD5("forensic image of drive")  →  a3f5d9e2b1c8470f9a0d6e3b2c5a4f87

Change a single bit in the input → completely different 32-character output.

MD5 is fast and widely supported; it is still used in digital forensics for integrity verification despite being computationally broken for collision attacks (finding two different inputs with the same MD5 hash) in cryptographic security contexts. For forensic integrity verification (confirming the image has not changed since acquisition), MD5 remains widely accepted.

SHA-256 Hash Function

SHA-256 (Secure Hash Algorithm 256-bit) produces a 256-bit (64 hexadecimal character) digest. SHA-256 is the current standard for new forensic work because it is cryptographically stronger than MD5 and collision-resistant under current computational capabilities.

Hash Verification

The hash verification procedure is:

  1. Before imaging: compute the hash of the original drive
  2. Create the bit-stream copy
  3. After imaging: compute the hash of the resulting image
  4. Compare the two hash values — if they are identical, the image is a bit-for-bit perfect copy of the original

This procedure is repeated at each transfer: when the image is stored on an evidence server, when it is provided to a defense expert, and before any analysis session begins. A matching hash at each step proves the evidence has not been altered since the original acquisition.

Diagram: Forensic Imaging and Hash Verification Workflow

Forensic Imaging and Hash Verification Workflow Interactive Diagram

Type: workflow sim-id: forensic-imaging-workflow
Library: p5.js
Status: Specified

Learning Objective: Explain the forensic imaging procedure and verify the integrity of a forensic image using hash comparison (Bloom Level 2 — Understand; verb: explain).

Bloom Level: Understand (L2) Bloom Verb: Explain

Canvas layout: - Horizontal workflow with five stations: Original Drive → Write-Blocker → Forensic Workstation → Forensic Image → Hash Verification - Each station is a labeled icon with a detail panel that opens on click

Visual elements: - Animated data flow arrows between stations - Write-blocker shown as a physical block preventing reverse data flow - Hash computation animation at Original Drive and at Forensic Image stations - Comparison panel showing MD5 and SHA-256 hashes side-by-side with green checkmark if matching

Interactive controls: - Click each station to reveal purpose and procedure - "Simulate Tamper" button: modifies one bit in the image and recomputes hash — shows how the hash completely changes even for a 1-bit modification - Toggle: MD5 / SHA-256 display

Data Visibility Requirements: - Show example hash values in correct format (32 hex for MD5, 64 hex for SHA-256) - Show the tamper simulation: one character change in image → completely different hash value - Show the "MATCH" / "MISMATCH" result clearly

Instructional Rationale: An Understand objective (explain the imaging and hash workflow) benefits from a visual, step-by-step workflow diagram with the ability to simulate tampering — making the importance of hash verification concrete and memorable.

Color scheme: Blue for data flow, gray for hardware components, green for verified match, red for mismatch/tamper.


Metadata: The Hidden Record in Digital Files

Every digital file carries more information than its visible content. Metadata is data about data — the hidden record of a file's history, origin, and creation conditions embedded within the file itself or recorded by the operating system.

EXIF Metadata Recovery

EXIF (Exchangeable Image File Format) metadata is embedded in digital photographs and video files by the camera or smartphone at the moment of capture. EXIF data typically includes:

  • Camera make and model — the specific device that captured the image
  • Date and time of capture (drawn from the device's internal clock)
  • GPS coordinates of the capture location (if location services were enabled)
  • Camera settings — aperture, shutter speed, ISO, focal length
  • Software version — the camera firmware or app used to capture the image

EXIF metadata is recovered using forensic tools (ExifTool, Autopsy, FTK) that read the metadata fields embedded in the image file header. The metadata is not visible in the image itself and is not removed when photographs are shared via SMS or uploaded to many social media platforms (though some platforms strip EXIF before display).

Digital Timestamp Analysis

Digital timestamp analysis examines the timestamps associated with files and file system operations. A file typically has three system timestamps:

  • Created (birth) timestamp: when the file first appeared on this file system
  • Modified timestamp: when the file's content was last changed
  • Accessed timestamp: when the file was last opened or read

Timestamps are valuable but require careful interpretation. Timestamps record the local system time, which depends on the device's clock accuracy and time zone settings. Incorrect clocks, time zone changes, and deliberate timestamp manipulation can all produce misleading timestamps. Timestamp analysis involves correlating file timestamps against external time sources (server logs, cell tower records, email headers with server timestamps) to establish a reliable timeline.

Geolocation from Metadata

Geolocation from metadata extracts GPS coordinates embedded in EXIF data to place a photograph at a specific physical location. GPS coordinates in EXIF data are expressed as decimal degrees (latitude and longitude) and can be plotted directly on a map to identify the location where the image was captured — or, when a suspect denies being at a location, to contradict that claim with physical evidence from the device.

Even when GPS coordinates are absent from EXIF, geolocation may still be possible through cell tower triangulation (see Chapter 17) or by identifying recognizable landmarks in the image background.

Investigator Tip

Trace holding magnifying glass When examining EXIF metadata GPS coordinates, always check the timestamp against the GPS coordinates for internal consistency. A photo allegedly taken at a specific location should show the GPS coordinates of that location — and the timestamp should be consistent with the time zone of those coordinates. A GPS location in Tokyo with a device timestamp in Eastern Standard Time, for example, may indicate the device clock was set to the wrong time zone, or that metadata has been manually altered.


Network Forensics

Network forensics is the analysis of network traffic and logs to reconstruct communications, identify participants, and establish timelines of network activity.

IP Address Analysis

Every device on the internet communicates using an IP address — a numerical label that identifies the device on the network. IP address analysis in a forensic investigation traces network communications to their sources and destinations.

Key distinctions: - Public IP address: the address visible on the internet, assigned by the Internet Service Provider (ISP); associated with a subscriber account that can be identified through a legal process - Private IP address: the internal address within a home or office network (typically 192.168.x.x or 10.x.x.x); not directly traceable from the internet without access to the router's NAT logs

Geolocation of IP addresses assigns an approximate physical location (city, ISP, or organization) to a public IP address using databases that map IP address blocks to their registered owners. IP geolocation is approximate — typically to the city level — and can be spoofed or masked by VPNs and proxies.

Packet Sniffing

Packet sniffing (also called network capture or traffic analysis) is the real-time or retrospective capture of network data packets as they traverse a network segment. Tools such as Wireshark capture and decode individual packets, allowing examiners to reconstruct:

  • Website URLs accessed
  • Email content (if unencrypted)
  • File transfers
  • VoIP call data
  • DNS lookups (which reveal domain names even when content is encrypted)

Packet sniffing on a shared network segment typically requires a court order when applied to communications of others. Law enforcement may execute network captures at an ISP or at a suspect's premises under appropriate legal authority.

Firewall Log Analysis

Firewall log analysis examines the logs generated by network firewalls and routers, which record connection attempts, blocked traffic, and allowed connections. Firewall logs typically contain: source IP address, destination IP address, port number, timestamp, and allowed/blocked decision.

Firewall logs can reconstruct the history of a device's network connections even after the device itself has been wiped, because the logs are stored on the network infrastructure rather than on the suspect device. Preserved firewall logs can place a device at a specific IP address at a specific time, correlate with other time-stamped evidence, and identify command-and-control server connections associated with malware activity.


Encryption, Steganography, and the Dark Web

Three concepts often encountered in digital investigations require special mention because they are specifically designed to conceal information.

Encryption in Digital Forensics

Encryption transforms data into an unreadable format using a mathematical key. Without the correct decryption key, encrypted data appears as random bits. Full-disk encryption (BitLocker on Windows, FileVault on macOS) encrypts every sector of a drive; the data is only accessible when the correct password or key is provided at boot time.

For forensic examiners, an encrypted drive presents a significant obstacle:

  • A forensic image can still be made of an encrypted drive — but the image contains only ciphertext
  • If the device was running when seized, the decryption key may be in RAM — making RAM collection before shutdown critical
  • Password cracking using dictionary attacks or brute force may be feasible for short, weak passwords but is computationally infeasible for long, random passwords with current technology
  • Legal tools: courts can compel disclosure of encryption passwords in many jurisdictions; the Fifth Amendment implications are an active area of law

Steganography

Steganography (from Greek: hidden writing) is the technique of concealing information within other, apparently innocuous data. In digital contexts, steganographic tools embed secret data within image files, audio files, or video files by making imperceptible modifications to the file — for example, changing the least significant bits of pixel values in an image. The altered image looks identical to the original to casual inspection; only examination with the correct steganalysis tool or key reveals the hidden content.

Steganography is used to conceal communications and to smuggle data out of secured environments without triggering content inspection systems that look for suspicious keywords.

Forensic detection of steganography uses steganalysis — statistical analysis of digital files to detect the subtle patterns introduced by embedding operations. Files carrying hidden data show statistical anomalies in their bit patterns that can be detected even without knowing the specific steganographic method or key used.

Dark Web Evidence

The dark web refers to internet content that exists on overlay networks — most commonly the Tor network — that require specialized software to access and that route traffic through multiple encrypted layers to conceal the IP addresses of both clients and servers. The dark web hosts marketplaces for illegal goods, communications for criminal enterprises, and infrastructure for cybercrime.

Investigating dark web activity presents unique challenges: - Tor's onion routing conceals source IP addresses through multiple relays - .onion domains are not indexed by standard search engines and are not publicly registered - Transactions typically use cryptocurrency to avoid financial tracing

Forensic approaches include: examining the suspect device for Tor browser installation and configuration files, reviewing browser history and cached .onion URLs, recovering cryptocurrency wallet software and transaction logs, and working with international law enforcement partners who may have infiltrated dark web operations or compromised Tor exit nodes.

Diagram: Metadata Recovery and Timeline Builder

Metadata Recovery and Timeline Builder MicroSim

Type: microsim sim-id: metadata-timeline-builder
Library: p5.js
Status: Specified

Learning Objective: Apply EXIF metadata recovery to extract timestamps and GPS coordinates from digital evidence files and construct an event timeline (Bloom Level 3 — Apply; verb: apply).

Bloom Level: Apply (L3) Bloom Verb: Apply

Canvas layout: - Left panel (~40%): Simulated file browser showing a folder of evidence photographs with file icons - Right panel (~60%): EXIF data readout panel + timeline visualization

Visual elements: - Clickable file icons representing 6-8 simulated evidence photos with different timestamps and locations - EXIF readout showing: Camera Make/Model, Capture Date/Time, GPS Lat/Long, Device Time Zone - Timeline at bottom showing selected photos plotted in chronological order - Mini map showing GPS coordinates of selected photos as map pins

Interactive controls: - Click any photo icon to load its EXIF data into the readout panel - "Add to Timeline" button places the photo on the chronological timeline - "Plot on Map" button shows the GPS coordinates on a simplified map grid - "Export Timeline" displays a formatted text report of all photos in chronological order

Data Visibility Requirements: - Show EXIF data in human-readable format - Show GPS coordinates in both decimal degrees and DMS (degrees/minutes/seconds) format - Highlight time zone inconsistencies in red when detected - Show the sorted chronological timeline as photos are added

Instructional Rationale: An Apply-level objective (extract and use EXIF metadata) requires hands-on practice with simulated evidence files, building a timeline that mirrors real investigative workflow.

Color scheme: Dark forensic blue background, white panels, green for verified data, red for anomalies/inconsistencies.


Key Concepts Review

The following table summarizes the major concepts from this chapter:

Concept Definition
Volatile Data Data that disappears when power is removed; RAM is the primary source
Non-Volatile Data Data that persists after power-off; stored on HDD, SSD, USB, cloud
RAM Collection Memory dump acquired before shutdown; captures encryption keys, active processes
Write-Blocker Hardware device that prevents any write operations to the evidence drive
Bit-Stream Copy Sector-by-sector forensic image including deleted files, slack, and unallocated space
MD5 Hash 128-bit digest used to verify forensic image integrity
SHA-256 Hash 256-bit digest; current standard for new forensic acquisitions
Hash Verification Comparing hash of original drive to hash of image proves bit-perfect copy
EXIF Metadata Embedded image file data: camera, timestamp, GPS coordinates
Digital Timestamps Created/Modified/Accessed file system timestamps; must be correlated against external sources
Geolocation GPS coordinates from EXIF data place a photograph at a physical location
Packet Sniffing Network traffic capture revealing connection history, protocols, and (if unencrypted) content
Firewall Logs Network infrastructure records of connections; persist even after device is wiped
Encryption Mathematical data protection; full-disk encryption requires key for access
Steganography Hidden data embedded imperceptibly within image or audio files
Dark Web Tor-based overlay network requiring specialized access; .onion domains

Challenge: Evidence Priority

An investigator seizes a running laptop suspected of involvement in network-based fraud. The laptop is logged in, the screen is active, and an encrypted communication application is open. The hard drive uses full-disk encryption.

In what order should evidence be collected, and why? What would be lost if the investigator immediately powered off the laptop?

Answer: The correct collection order is: (1) RAM dump first — the logged-in state means the disk encryption key is in RAM; the open communication application may have session keys, received messages, and connection logs only in memory. Collect RAM immediately using a memory acquisition tool before any other action. (2) Network connection documentation — screenshot or record the active network connections (what servers is the application communicating with?). (3) Forensic image of the drive — with the encryption key recovered from RAM, the drive can be decrypted; without it, the image contains only useless ciphertext.

If the investigator powered off immediately: The RAM contents (including the disk encryption key and all volatile evidence) would be lost. The resulting encrypted forensic image would be inaccessible without the key, and the device would provide no useful digital evidence despite containing substantial material. This mistake is irreversible — the evidence cannot be recovered after shutdown.


Case Closed — For Now

Trace raising magnifying glass in celebration Digital evidence is fragile — but it is also pervasive. Every interaction with a device leaves a trace, and trained examiners know exactly where to look and in what order to look. You now understand the volatility hierarchy, the role of hash functions in preserving evidence integrity, the power of metadata in placing people and devices at specific times and locations, and the special challenges of encryption and the dark web. Chapter 16 turns to facial recognition — one of the most powerful and most controversial tools in modern law enforcement. Follow the evidence — face first.

See Annotated References