Can you restore data from a deleted file that was previously emptied?

In summary, when deleting a file, the address where the file is on the hard drive is usually deleted, but the content can still be restored as long as nothing is written over it. However, if the content is fully erased before deleting the file, the data can still be retrieved with certain tools. This process may make it harder to find the data, but it is not impossible. To ensure complete erasure, tools like Eraser can be used to overwrite the file's blocks multiple times. This is especially important for SSDs, as simply overwriting the content may not fully delete the original data. Some SSD vendors may have tools that allow for secure wiping of the entire drive
  • #1
jack action
Science Advisor
Insights Author
Gold Member
2023 Award
3,071
7,526
TL;DR Summary
Does erasing the content of a file before deleting it offer any advantages to making the content unrestorable?
Say you have a large text file on your hard drive that you edit such that its content is fully erased and you save it that way. Then you delete the file. Is the content still on the hard drive?

Usually, deleting a file only deletes the address where the file is on the hard drive and the content of the file can be restored as long as nothing is written over it. But in the case of erasing the content before deleting the file, can the content still be retrieved? If so, would it at least make it a little bit harder to find or would the computer overwrite the erased content faster?
 
Technology news on Phys.org
  • #2
jack action said:
Say you have a large text file on your hard drive that you edit such that its content is fully erased and you save it that way. Then you delete the file. Is the content still on the hard drive?
You need to specify more exactly what "fully erased and you save it that way" means. The specific answer might well be application dependent.

If you mean you are editing a text file, and you delete the entire file in your editor so it's empty and then save the file, that would not erase the file's data on disk, it would just release the blocks where the file's data was previously stored into the free pool, which means the OS would be free to overwrite the data if it needs to store new data. But it could be a while before that happens (it would depend on how full your disk is and how much file saving you do).

But not all applications work the same way. Some applications write empty data to disk in place of the old data (meaning, they write to the same blocks on disk where the old data was stored) when you "erase" something. In those cases, the old data would be gone.

jack action said:
Usually, deleting a file only deletes the address where the file is on the hard drive
More precisely, it deletes the tracking entry that points to where the file's data is. Even that is not necessarily straightforward. On Linux, for example, deleting a file just deletes a directory entry; it doesn't necessarily touch the file's inode, which is where the pointer to the actual data is located. The inode gets deleted only if all directory entries that point to it have been deleted.

And even that isn't always the case, since many systems have a "trash can" or something like it, so "deleting" a file doesn't actually mean deleting its directory entry, just moving that entry from the previous directory to the "trash can" directory. To the OS, the file is still pointed to by a directory entry and it still exists as a file. You would have to empty the trash can to actually delete the file as described above.

jack action said:
the content of the file can be restored as long as nothing is written over it.
If you can find the data, yes. The easiest case is if the blocks on the disk that stored the file's old directory entry have not yet been overwritten; then a file restore program can just look at the old directory entry and find the file's data.

It gets harder if the old directory entry has been overwritten by newer data, but it's still not impossible. File restore programs have various tricks to try.
 
  • Like
Likes jack action
  • #3
jack action said:
Usually, deleting a file only deletes the address where the file is on the hard drive and the content of the file can be restored as long as nothing is written over it. But in the case of erasing the content before deleting the file, can the content still be retrieved?
It can still be retrieved unless serious efforts were made to erase the data. Certainly, the FBI has tools to recover files that were simply modified once and deleted. I don't know who else would have those recovery tools.
jack action said:
If so, would it at least make it a little bit harder to find or would the computer overwrite the erased content faster?
Yes, it makes it harder. A casual effort would not recover the data. But If you want to do a better job of erasing data, I recommend using a tool like Eraser by Heidi. It will overwrite a file several times with random bits. There are several options for overwriting the files:
1676068170888.png

A final option is the total physical destruction of the drive.
 
  • #4
Not sure if anyone has mentioned this yet, but for SSD's overwriting the content does not usually delete the original data just writes as similar number of blocks in a different place.
 
  • Like
Likes Rive and FactChecker
  • #5
Filip Larsen said:
Not sure if anyone has mentioned this yet, but for SSD's overwriting the content does not usually delete the original data just writes as similar number of blocks in a different place.
Exactly. I'm sure the same is true of any hard drive. An eraser tool would have to go to the physical blocks of the current file and directly write random bits there.
 
  • #6
I wonder if some SSD vendors have tools that allow security wipes of either the whole disc or perhaps even selected files or similar? I assume it requires a low-level tool to map and overwrite the physical blocks in use.
 
  • #7
@jack action - are you hoping for data recovery of a file, in this instance?
 
  • #8
Filip Larsen said:
I wonder if some SSD vendors have tools that allow security wipes of either the whole disc or perhaps even selected files or similar? I assume it requires a low-level tool to map and overwrite the physical blocks in use.
Windows has a full reformat of a drive. It's my understanding that their quick format just cleans out the directory. It takes a long time to do a full reformat a large drive.
The Eraser tool that I referenced above allows you to overwrite the blocks of an individual file many times. And you can make Eraser an option for the files in Windows Explorer.
 
  • #9
Don't forget about backups of the file. In general, I recommend that any sensitive information be kept in an encrypted drive with a strong password. That way, your backups are as well protected as the original. I use VeraCrypt by IDRIX for that.
 
  • #10
FactChecker said:
I'm sure the same is true of any hard drive.
Not quite. SSDs work differently from previous hard drives. On previous hard drives, although many applications, when "overwriting" a file, actually just write new data to different blocks and then change the file's inode (or whatever the Windows equivalent is) to point to the new blocks instead of the old, it is also perfectly possible, either for applications or the OS itself, to explicitly write new data to the same blocks. (Many database engines, for example, explicitly control writes at the block level, bypassing the OS's filesystem driver.)

On SSDs, however, there is no way for an application or an OS to explicitly write new data to the same blocks. From the application's or the OS's point of view, every write writes a new block. The drive's internal firmware can overwrite blocks in order to garbage collect old blocks that no longer contain active data, but this is not done unless absolutely necessary, in order to preserve the NAND flash memory on the drive as long as possible. There might be some special tool that can force the drive to do this to purposely erase data, but it's not something that is ordinarily done the way applications and OSs could with previous hard drives.
 
  • Like
Likes phinds, FactChecker and jack action
  • #11
A workable approach for wiping a SSD drive if this requirement is known before filling it with data is to use it only with full disk encryption. This means the drive (or partition in question) is effectively wiped if one can wipe just the master key. Alternatively one can make a logical encrypted disk in a regular file system using a tool like truecrypt (or whatever pose as its replacement these days - truecrypt was marked as Not Safe Anymore by its maintainer some years ago).
 
  • #12
The thought came up to me as I often write to-do lists in a txt file and I delete the items once they're done. In the end, I sometimes end up deleting the last item leaving an empty file that I delete afterward. I was wondering if deleting the last items before deleting the file made a difference with the data stored on the disk.

I was thinking that the program used to modify the content may affect the process. But let's keep things simple. Open a file with python and put an empty string in it then close the file. Do we really know what happened to the old data? What if I do the same thing with Java instead? What if it is not done with the same OS? Are different programs really using different approaches to how the files are saved? (Not talking about programs made especially for data security.)

Or is it really a new file that is written each time I save it, meaning I have all of my previous versions saved up somewhere until they are written over?
 
  • #13
jack action said:
Open a file with python and put an empty string in it then close the file. Do we really know what happened to the old data? What if I do the same thing with Java instead? What if it is not done with the same OS? Are different programs really using different approaches to how the files are saved?
Most programs or programming languages are probably using the standard library functions provided either by the OS or the C library for the OS. (I know Python does.) You would need to look up the documentation or source code for those functions to know for sure exactly what they do.
 
  • #14
jack action said:
The thought came up to me as I often write to-do lists in a txt file and I delete the items once they're done. In the end, I sometimes end up deleting the last item leaving an empty file that I delete afterward. I was wondering if deleting the last items before deleting the file made a difference with the data stored on the disk.
It makes some difference. A casual attempt will not get the old version, but a serious effort can get it.
jack action said:
I was thinking that the program used to modify the content may affect the process. But let's keep things simple. Open a file with python and put an empty string in it then close the file. Do we really know what happened to the old data? What if I do the same thing with Java instead?
Two different languages running in the same operating system, using high-level system calls, probably do the same thing. That being said, some languages run in a "sandbox" and might interact with the hardware differently.
jack action said:
What if it is not done with the same OS?
Using a different OS can make a difference.
jack action said:
Are different programs really using different approaches to how the files are saved? (Not talking about programs made especially for data security.)

Or is it really a new file that is written each time I save it, meaning I have all of my previous versions saved up somewhere until they are written over?
It is more likely that the decision of what to do when a file is changed is done a block at a time. I doubt that a small change in a huge file would cause a rewrite of the entire file.
I think that it would take a lot of investigating to determine the answers to all these questions and the answers might be different for different files, languages, OS, hardware devices, and device handlers.
 
  • Like
Likes jack action
  • #15
FactChecker said:
I doubt that a small change in a huge file would cause a rewrite of the entire file.
I'm not so sure. Applications in general don't view files as sequences of blocks, but as sequences of bytes. (There are some exceptions, such as the database engines I mentioned before; but a text editor, for example, views files the way I have described.) If the disk has 512 byte blocks, and a change is made that inserts a byte after byte 500 of the file (so bytes 501 and on are now different from what they were before), saving the file in a text editor will result in every block of the file being rewritten to disk, since every block's bytes have changed.
 
  • Like
Likes jack action and FactChecker
  • #16
PeterDonis said:
saving the file in a text editor will result in every block of the file being rewritten to disk, since every block's bytes have changed.
I'll buy that. I suppose this is especially true if the editor encrypts the file, as I have been doing for sensitive information.
 
  • #17
PeterDonis said:
saving the file in a text editor will result in every block of the file being rewritten to disk
Well, a program doing, say incremental save of a textual file, don't have to update the whole file but could resort to use seek and only write from the first actual change. That is often how logger libraries work to avoid having an O(N2) performance as the file grows.

There are even file systems that allow you to seek and write content with holes, i.e. the file system has the option to skip (some) blocks earlier in the file that has never been written yet.

But back to the question. If I had to use python to write and later read a file (i.e. a file not needing to be read by say a text editor) which I want to be able to securely wipe afterwards even if the file resides on a SSD or flash drive, then for I would consider to use encryption. Even the seek and write pattern of a logger can be used with block ciphers at least if a suitable block cipher mode is selected.
 
  • #18
The application should neither know nor care about disk formats. You do not want tp update your app when plugging in a slightly different USB sticlk.

Security needs to be balanced - what are you protecting and against whom. Protecting the
honeydew list" one's spouse gave you from attacls by major world governments might be overkill. ("Comrade Boris! Target is painting ceiling beige!")

People want teir data to be erased permanently and irrevocably - unless they erased it by mistake. Then they want ir immediately and easily recoverable. This is not easy to do.

My Linux filesystem does writes by making copies, and when the copy is known to be good, the original is marked for deletion. It makes copies at the block level. However, for a honeydew list, I bet one file is one block.
 
  • Like
Likes FactChecker
  • #19
Vanadium 50 said:
Security needs to be balanced - what are you protecting and against whom. Protecting the
honeydew list" one's spouse gave you from attacls by major world governments might be overkill. ("Comrade Boris! Target is painting ceiling beige!")
Ha! Good point.
That being said, I would encrypt financial data such as tax returns. Encryption has the advantage that you can do backups of everything without worrying about the financial data on it. Relying on deleting unencrypted files would require deleting those files on any backup media.
 
  • Like
Likes Filip Larsen
  • #20
Filip Larsen said:
a program doing, say incremental save of a textual file, don't have to update the whole file but could resort to use seek and only write from the first actual change.
Yes, that's why I specified, in my example, that the change was in the first block, so every block's data would change. The application won't actually know that; all it knows is that it is saving the file's data from byte 510 on (if it uses the incremental save method that you describe, which many editors do). But the filesystem driver will end up actually writing to disk in blocks, and every block's data will be different from what is currently on disk.
 
  • Like
Likes Filip Larsen
  • #21
PeterDonis said:
But the filesystem driver will end up actually writing to disk in blocks, and every block's data will be different from what is currently on disk.
But would it write it on different blocks or overwrite the original blocks (even if there are enough blocks available somewhere else)?
 
  • #22
jack action said:
would it write it on different blocks or overwrite the original blocks
I would expect most filesystems and OSs to write it on different blocks. That minimizes the chance for corruption of data since the operation of actually changing the file data then reduces to just updating pointers in the inode (or whatever the Windows equivalent is); you're not actually overwriting any data, you're just shifting pointers from the old data to the new. That means there is never a time when a complete, consistent version of the file's data is not on disk--it's the old version until all of the new blocks are written and the pointers updated, then it's the new version.
 
  • Like
Likes jack action and FactChecker
  • #23
jack action said:
But would it write it on different blocks or overwrite the original blocks
The application should neither know nor care. The OS and FS determine this,.

On my system, what happens is (silightly simplified) is that the new data is written to empty space and checked. If it passes the check, then the pointer to the data is updated from the old data to the new data. This effectively marks the old space as available for new writes.

But there are many other options.
 
  • Informative
  • Like
Likes jack action and berkeman
  • #24
PeterDonis said:
I would expect most filesystems and OSs to write it on different blocks
Modern, probably. No good reason to do otherwise. Not so much the older ones. Then you had downsides - fragmentation was an issue, and available space was a bigger issue. And, as you might expect, corruption was a bigger issue then.
 
  • #25
You certainly can NOT COUNT on the same block being used. In a modern SSD, the writes are spread out so that no part is used excessively. That extends the useful life of the SSD. I'm not sure if there is any way to control that at the level above the SSD device.
 
  • Like
Likes jack action
  • #26
FactChecker said:
I'm not sure if there is any way to control that at the level above the SSD device.
There is, but why would one want to?

One allocates the entire SSD to one big file and fills it. To write a block, one needs to erase a block, and one can always pick the block to erase. Now there is only one unused block so that's where the write happens.

But why would anyone do that? "What it loses in speed it gains in unreliability."
 
  • Like
Likes FactChecker
  • #27
Vanadium 50 said:
One allocates the entire SSD to one big file and fills it.
Not sure that will work on modern SSD. As far as I am aware, they have more physical blocks than logical available and will automatically remap "worn out" (nearly bad) blocks to new "fresh" blocks in order to maintain available disk space over time and without having to perform OS-level disk recovery. I assume such a "retired" block is near impossible to read from the OS-level but I also assume most vendors will have a analysis or recovery tool available that likely are able to read most data in those blocks anyway.
 
  • Like
  • Informative
Likes Vanadium 50, jack action and FactChecker
  • #28
jack action said:
But would it write it on different blocks or overwrite the original blocks (even if there are enough blocks available somewhere else)?
It depends hugely on the operating system, the version of that operating, and the type of disk media.
In some cases with spinning media, updated data will overwrite existing blocks - thus reducing file fragmentation. But you certainly can't depend on this.

I am not on a Windows system right now, so I can't check it out, but there used to be a DOS command "cipher" that would perform an overwrite of unused sectors on any NTFS file system. It would overwrite the unused sectors with 5 different patterns so that the data on spinning media could not be recovered even with the most sophisticated techniques.

cipher /w:c:
 
  • Like
Likes jack action and FactChecker
  • #29
Just FYI, POSIX has this one:

shred

"Overwrite the specified FILE(s) repeatedly, in order to make it harder for even very expensive hardware probing to recover the data."Beware of the CAUTION mentioned though.
 
  • #30
sbrothy said:
Just
Look at the copyright on the link: 2016. In 2023 journalling file systems and SSD drives are ubiquitous. If you want to destroy (rather than erase) data, destroy the drive. In a fire.
 
  • #31
FactChecker said:
Windows has a full reformat of a drive. It's my understanding that their quick format just cleans out the directory. It takes a long time to do a full reformat a large drive.
The Eraser tool that I referenced above allows you to overwrite the blocks of an individual file many times. And you can make Eraser an option for the files in Windows Explorer.
Does a full reformat write Ctrl-Z's and re-read ( or what ever the operating system uses for null data ) to the entire disk?
Or does it just read and re-write the sector information, and cluster size it.
Some older systems did not re-write the whole disk with full format, and just took samples of sector data to verify good/bad bits/sectors.
 
  • #32
pbuk said:
Look at the copyright on the link: 2016. In 2023 journalling file systems and SSD drives are ubiquitous. If you want to destroy (rather than erase) data, destroy the drive. In a fire.
I'm obviously a dinosaur.

Incidentally, I had an extremely disgusting talk with a self-confessed pedophile when I was a moderator on a filesharing network (DC++) a long time ago (I knew which keywords to look for, I never downloaded anything from him.) I soon had the authorities on the phone trying to track him down but he was a slippery customer. They never found him even though I willingly gave them all the data I had. They didn't care one bit about our piracy.

Point is:

He claimed he had a remote controlled magnesium block rigged to burn lying on top of his harddisk-stack.

Talk about a nasty experience. Obviously I don't know if he was telling the truth but he seemed very smug and perversely proud of himself.

I had to take a very long shower afterwards.
 
  • #33
256bits said:
Does a full reformat write Ctrl-Z's and re-read ( or what ever the operating system uses for null data ) to the entire disk?
Or does it just read and re-write the sector information, and cluster size it.
Some older systems did not re-write the whole disk with full format, and just took samples of sector data to verify good/bad bits/sectors.
I'm unsure of the details, but several people (of unknown expertise) say that the full format makes the data unrecoverable. (see e.g. this ). The full format certainly takes a very long time as though something is being written to the entire disk.
 
  • Like
Likes 256bits
  • #34
FactChecker said:
I'm unsure of the details, but several people (of unknown expertise) say that the full format makes the data unrecoverable. (see e.g. this ). The full format certainly takes a very long time as though something is being written to the entire disk.
There’s a difference here between magnetic media and flash. Flash, once physically overwritten, is truly gone. Magnetic media can be overwritten several times but a well-funded attacker (in practice, that means a national intelligence agency) with a scanning electron microscope and possession of the media will still be able to recover much of the old data.
 
  • Like
Likes 256bits and FactChecker
  • #35
pbuk said:
In a fire.
Drill press!

I will make the same comment I have elsewhere - are you trying to keep your little sister out, or major world governments out?

If the latter, I would reply with two statements:
1. You're not that important.
2, Never, ever, ever write anything to the drive that isn't encrypted.
 
  • Like
Likes FactChecker

Similar threads

  • Programming and Computer Science
Replies
10
Views
1K
  • Programming and Computer Science
Replies
11
Views
953
  • Computing and Technology
Replies
15
Views
1K
Replies
14
Views
2K
  • Computing and Technology
Replies
24
Views
7K
  • Programming and Computer Science
Replies
15
Views
1K
  • Programming and Computer Science
Replies
4
Views
4K
  • Programming and Computer Science
Replies
1
Views
8K
Replies
16
Views
2K
  • Computing and Technology
Replies
1
Views
1K
Back
Top