BitCurator and Archival Workflows

This page provides an overview of the four basic areas of archival workflow with born-digital materials that BitCurator supports: forensic disk imaging, data triage, identification of private and sensitive information, and metadata export.

Forensic Disk Imaging

BitCurator uses open source disk imaging tools, including Guymager and dcfldd, to capture bit-identical images from magnetic, optical, solid-state, and hybrid media. Disk images can be captured in various formats, including raw (just the bitstream), E01 (Expert Witness Format, which we support using the open source libewf library), and AFF (Advanced Forensic Format).

Capturing disk images in forensic formats such as E01 and AFF provides many advantages. The images can be stored compressed or uncompressed, can be split into multiple storage containers, can be parsed at the filesystem level without explicitly extracting the raw image, and embed provenance and capture metadata along with the bitstream. Forensic images ensure that no inadvertent changes are made during pre-ingest chain-of-custody, and provide a consistent baseline for generating different types of access materials.

In the BitCurator environment, forensically-packaged disk images can be mounted simply by right-clicking on them and selecting "Mount Disk Image". This allows you to browse the source in a safe, read-only environment. (More detailed instructions are here.)

BitCurator imaging procedure (click image to zoom)

Forensic Processing and Identification of Potentially Sensitive Information

BitCurator incorporates file system analytics and stream-based forensics technologies to identify and assist with redaction of potentially private and sensitive information (including personally identifiable information). In particular, BitCurator incorporates fiwalk, a tool originally developed by Simson Garfinkel (and later incorporated into The Sleuth Kit), to export an XML file detailing file system hierarchy within a disk image, including files and folders, deleted materials, and information in slack space.

BitCurator includes Garfinkel's bulk_extractor (and the bulk_extractor GUI, BEViewer) to assist users in finding potentially private and sensitive information. The bulk_extractor tool employs stream-based forensics (analyzing the disk image at the block level), executing parsers to identify features such as email addresses, geolocation metadata, and credit card numbers (among many others; see this page for a detailed description of what you can search for and why you might wish to).

Running open source forensic tool can be problematic if you're not familiar with the command line or forensic workflows. BitCurator automates the process by providing an integrated front-end graphical interface (the BitCurator Reporting Tool) that chains together tools that will be most useful to you in processing digital materials.

BitCurator Potentially Sensitive Information/PII identification (click image to zoom)

Data Triage

BitCurator includes software that assists users with data triage; identifying and prioritizing important information in raw and forensically packaged disk images. This includes files format identification, location of deleted files and files fragments, cryptographic hashing, and reporting on potentially private and personally identifying information. BitCurator includes Vassil Roussev's sdhash to report on file similarity, regripper for Windows registry analysis, and ClamAV for virus and malware detection.

BitCurator triage facilities (click image to zoom)

Metadata Export

BitCurator generates technical metadata in the form of Digital Forensics XML (DFXML). DFXML has been developed around a set of digital forensics tools that can both consume and produce a common set of tags; DFXML has an evolving schema and tag library, and is well documented. BitCurator also generates PREMIS metadata for each data forensics tool that is applied to a disk image, providing an accurate record of provenance for each stage of processing.

BitCurator metadata export (click image to zoom)