Windows May Be Storing All Your Email and Docs as Unencrypted Plaintext
Forensic analyst and data recovery expert Barnaby Skeggs has discovered a troubling design decision in Windows 8.1 and Windows 10 that could open certain users to data theft without said users being any the wiser. While performing a detailed system analysis to determine if a particular email had ever been viewed on a computer, Skeggs found a record of the email title in an unusual file — WaitList.dat. He writes:
I identified the ‘WaitList.dat’ artefact while investigating a Windows 8.1 PC for the presence of a known email. I was provided with a copy of this email, and part of the investigation involved identifying whether or not this email ever existed on the custodian’s computer. After processing the .PST and .OST mailbox archives on the PC, I did not identify the existence of the email. I then processed shadow copies, carved and processed for various mailbox stores and email files, and still did not identify the email. As a final attempt, I ran a string search for the email subject line across the whole forensic image. I received 1 hit within ‘WaitList.dat’. Investigation of this 140mb file identified metadata, and full body text of over 36’000 emails and documents, spanning back 3 years.
WaitList.dat isn’t a file you’ll find on every Windows 10 system, my own rig, for example, lacks this file. It’s only going to be found if you’ve enabled handwriting recognition in either Windows 8.1 or Windows 10. Skeggs then details the function of the handwriting recognition system in Windows 8.1 and Windows 10 and how the Input Personalization System (IPS) collects user data, which is then tuned by a “Text trainer” and stored in “lexicon blobs.” Microsoft claims that its system continually improves handwriting accuracy, which it may — but its WaitList.dat function doesn’t just collect handwritten information.
WaitList.dat contains Outlook emails, contact information, and the contents of various types of documents, including date/time, document IDs, the body of the files in question, and the company those files originated from. Skeggs writes:
WaitList will store multiple indexes for a single document over time. This provides a forensic examiner the ability to view historical iterations of a file, even when shadow copy is not enabled, or when the file has been deleted/wiped from the hard drive… An email or document can be recorded in WaitList without being read or opened by the user.
Because data stored within WaitList.dat isn’t deleted when documents are removed, it can also be used to recover information from a PC. Data within the WaitList.dat file is populated by the Windows Search Indexer. Skeggs has written a program in Python, WLrip, which can be used to export data in the file into TXT files, with each entry in the file receiving its own TXT. Metadata is reported in a separate CSV file. You can download the utility and view his report here.
There’s no word on why Microsoft thought it was a good idea to build a handwriting recognition system that functioned in part by building a comprehensive index of every document on a PC. While this will only affect systems with handwriting recognition enabled, the fact that this happened at all is concerning given the indiscriminate nature of the data collection.