Disappearing Azure Data Lake Files?

tl;dr

If you upload files from the Azure Data Lake Explorer in Visual Studio, it will set an expiration time on the uploaded files. That expiration time looks to be about one week from the upload time. Once that time has passed, the file will disappear.

Since there’s no option in the current Azure Data Lake Explorer in the Visual Studio tooling to change the expiration time or prevent it from setting it, you probably should avoid using it to upload files for now unless this is what you want.

If you’re using the Azure Data Lake Tools in Visual Studio Code, uploads from there do NOT set an expiration on uploaded files.

Want to see this changed? Vote on the User Voice

The rest of the story

Yesterday, a friend pinged me to ask if I’d ever seen files just up and disappear from an Azure Data Lake Store.

“Uhh, what…” was basically my response.

According to them, they’ve observed some U-SQL scripts they were storing on the data lake, just vanish.

At first, they thought someone was being careless, but that turned out to not be the case.

They even went and created a case with Microsoft about this because their Data Factories depended on those scripts. In the current state those Data Factories were failing causing all kinds of issues.

I’ve never seen this problem before, so I was at a loss. I reached out to a couple of online resources I had access to, but no one had seen this issue before.

A little while later, then pinged me again saying they found the culprit. It was how they were uploading the files from Visual Studio.

Currently, the Azure Data Lake Tools in Visual Studio will add an expiration time of about one week to any file it uploads.

There is no option to change or disable the expiration on upload.

We then tested using Visual Studio Code on OS X using the Azure Data Lake tools there. Uploads from those tools did not set and expiration time.

I can see reasons why you might want to set expiration on files, but for the tooling to force it with no option to disable it from the tooling is just plain wrong.

This entry was posted in Azure, Data Lake, U-SQL. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *