From BitCurator
Jump to: navigation, search

BitCurator incorporates a number of technologies and standards that may not be familiar to practitioners in the archives and library community. If you don't see the answer you're looking for here, let us know on the BitCurator Users listserv.


Using BitCurator

I've downloaded the BitCurator virtual machine. What do I do with this .tar.gz file?

If your host machine is running Windows, you may need a 3rd party tool such as 7-zip to extract the contents of this file.

If you're using Windows, you can download 7-zip here.

Once you've installed 7-zip, you'll need to right click on the .tar.gz file and click "Extract here". Once the .tar file has been extracted, you'll need to right click on that file and again click "Extract here". This will produce a folder containing the latest BitCurator virtual machine image.

Once you've extracted the contents, you can add the machine in VirtualBox by choosing Machine->Add in the VirtualBox menu, and navigating to the location where you've extracted the .vdi and .vbox files. On a Mac, you can simply double-click the .tar.gz file, and the built-in extractor should take care of extracting the contents. Then you can add the VM in VirtualBox using the same method.

I tried to download the BitCurator virtual machine, but it's taking forever (over an hour), won't decompress, or is otherwise corrupted.

This is an issue caused by high download traffic that we are working to fix. If you've been downloading the virtual machine for over an hour, stop the download (this differs by browser type, but usually there is an X or "stop downloading" link near the progress bar for your download). Retry downloading the file (i.e., click on the download link again). Note that we also offer an "iBiblio mirror" link for downloading the same exact file, which we recommend trying if downloading using the main link fails.

I've added the BitCurator VM using VirtualBox, but the virtual machine won't start (or crashes when it tries to boot)

There are a number of reasons this could be happening. If you're attempting to run BitCurator on a machine with less than 4GB of RAM, the default settings could be causing the host to lock up. Check the Oracle VM VirtualBox Manager to see if any of the tabs under "Settings" are indicating "non-optimal settings detected". If you're attempting to run BitCurator on a machine running a 32-bit operating system, please note that we do not test or support this configuration. Finally, some PC laptops with Intel processors ship with hardware-assisted virtualization extensions disabled in the BIOS, which may affect your ability to run 64-bit guest OSs in VirtualBox in Windows. If this is the case, you will need to reboot the machine and change the BIOS settings to enable Intel VT-x extensions.

I mounted a USB drive and lost control of my mouse cursor!

This is a known issue with VirtualBox (not BitCurator) when both a USB mouse and USB drive are plugged into the computer. Shut down BitCurator and unplug the USB drive; restart BitCurator. Once BitCurator has fully started up, plug the USB drive in again; both the drive and your mouse should now work. Alternatively, on some Windows systems you may need to right-click on the Oracle VirtualBox icon and choose "Run as Administrator" when starting up VirtualBox.

The BitCurator virtual machine is built for VirtualBox. Can I use VMWare instead?

The BitCurator VM is shipped as a .vdi, which is VirtualBox-specific. You can convert a .vdi to .vmdk (the VMWare native format) by following the instructions on this page. An easier and more reliable alternative would be to just create a new VMWare-specific VM using the BitCurator ISO (download here). In fact, this is how we create the BitCurator VM in the first place.

File Systems and Disk Image Formats

Disk imaging isn't part of my institution's preservation plan or mandate. Why should I image a disk or device if we're just going to copy out the useful files, save an original version, and serve out a migrated rendition?

Forensic disk imaging assists in providing a complete provenance record. Even if the image is not retained, it allows the user to store information about how files were acquired, where and when they originated, under what circumstances they were transferred, and relevant environmental data concerning their production.

It's difficult to know ahead of time what is actually on a disk, unless you're sitting next to a donor who is walking you through materials they expect to provide you with at some point in the future. Even then, the donor may be unaware of hidden and deleted (or partially deleted) data on their system disk or removable media, or may inadvertently provide you with malware-infected materials.

Disk imaging provides an opportunity to establish a ground truth about the materials you are processing for ingest into a digital archive. Imaging procedures which utilize hardware write-blocking allow for identifying potential damage and corruption on the source media, and ensure that there is always an unchanged source with which to perform further work.

What file systems does BitCurator support?

BitCurator currently supports FAT16, FAT32, NTFS, HFS, HFS+ and ext2, 3, and 4. The BitCurator virtual environment can recognize and provide browsable access to additional file systems, but many of the tools that the we rely on (including fiwalk for producing file system hierarchical metadata) are currently limited to these file systems. Note that stream-based tools, such as bulk_extractor, will attempt to extract relevant features irrespective of the underlying file system, although the effectiveness of the feature extractors will be limited in cases where the underlying file formats are not recognized.

What's the advantage of saving my disk images as AFF or EO1 rather than raw?

Both the Advanced Forensic Format (AFF) and Expert Witness Format (E01) file formats supported by BitCurator store bitstreams compressed, and incorporate metadata about the capture process and device configuration. There are some known issues with AFF when working with heavily fragmented NTFS volumes, and the upstream developer supports the use of E01, which has a fully open library for read and write access.


What's a hardware write-blocker? Do I really need one?

A hardware write-blocker is a device that connects to your host machine and prevents inadvertent changes to writeable media. Changes can be caused by modern operating systems at the time of connection even if you do not issue an explicit command or action within the operating system. Detailed information is available at the forensics wiki. We recommend that you use a hardware write blocker with all writable media, in order to prevent hidden, accidental, and malicious changes. We do not make specific hardware recommendations; the BitCurator project has successfully tested Tableau USB writeblockers, Tableau Ultrabays, and Digital Intelligence read-only switchable 3.5" floppy drives with a variety of media.

NIST has prepared a series of technical reports on tests of software write blockers. The reports can be found at software write block tools page.

I work at a small institution with limited resources. Am I going to need an expensive new dedicated workstation just to do digital forensics work?

In general, no. Modern digital forensics workstations incorporate features such as built-in write-blockers, hard-disk cooling trays, and dedicated storage media for local artifacts and databases in order to support a high level of performance and low level of failure in risk-reduced environments. Most of the functionality can be replicated on standard desktop hardware (and to a lesser degree on laptops).

We're having trouble reading a specific legacy magnetic media object. Can BitCurator help?

BitCurator does not provide specific instruction or guidance for working with magnetic media. There are a number of projects that handle floppy drive interfaces. In particular, these include the CatWeasel (available at some vendors but largely defunct), KryoFlux (currently available via the European distributor but of closed design), and the DiscFerret (recently developed, fully open, but currently only available to developers). Given the appropriate floppy interface, software used by BitCurator can read raw data on any required media.


My institution is already using a commercial digital forensics product. Is there any advantage to using open source digital forensics software alongside this product?

BitCurator incorporates some software that has specific performance advantages over currently available commercial packages. This includes bulk extractor, which can run feature detectors as individual threads on multi-core systems and does not rely directly on a database backend. This effectively means that bulk extractor typically runs many times faster than most commercially-available solutions in identfying personally identifying and private information features in data streams.

Does BitCurator provide support for file format verification?

Not currently. We expect that most users will use the BitCurator software environment for its primary purpose—to capture and analyze disk images—and use mature community supported tools within supporting preservation environments or as local microservices to perform additional verification on materials of archival or preservation interest.

What's the advantage of using the BitCurator virtual machine environment over installing the dependencies, software, and scripts by myself?

The BitCurator wiki provides information on acquiring, compiling, and installing most of the relevant features incorporated into the virtual environment.

Will BitCurator help me crack passwords on encrypted files in my collection?

Not currently. There are many good commercial and open source tools to help you do this.

What software DOES BitCurator currently include?

Please see this section of our Software Information page for the latest list of included tools, with brief descriptions of each and links to more information on each tool.


Is BitCurator intended to be a data preservation environment?

No. BitCurator is intended to support existing long-term data preservation environments, both as a data triage system and as a provider of software that may be integrated as microservices into existing toolchains. BitCurator depends on and produces only open source and public domain software, in order that the technologies may be fully integrated into existing Free data management and preservation environments such as Archivematica.

The BitCurator tools produce a lot of XML metadata. How do I transform this into a preservation format I can use?

Most of the tools incorporated into BitCurator produce or reprocess Digital Forensics XML, described in further detail on the forensics wiki. DFXML incorporates technical metadata tags within a framework based on Dublin Core. Relevant sections of DFXML can be readily crosswalked to desired standards.