Microsoft said Monday it was taking steps to fix a glaring security flaw that led to the exposure of 38 terabytes of private data.
The leak was discovered on the company’s AI GitHub repository and is said to have been published accidentally when a bucket of open source training data was published, Wiz said. It also contained a disk backup of two former employees’ workstations containing secrets, keys, passwords and over 30,000 internal Teams messages.
The repository called “robust-models-transfer,” is no longer available. Prior to its removal, it contained source code and machine learning models relating to a Research paper 2020 with the title “Do resilient ImageNet models transfer better?”
“The exposure came as a result of an overindulgence SAS token – an Azure feature that allows users to share data in a way that is both hard to track and hard to revoke,” Wiz said in a report. The issue was reported to Microsoft on June 22, 2023.

Specifically, the repository’s README.md file instructed developers to download the models from an Azure Storage URL that accidentally also provided access to the entire storage account, thereby exposing additional private data.
“In addition to the overly permissive access scope, the token was also incorrectly configured to allow ‘full control’ permissions instead of read-only,” Wiz researchers Hillai Ben-Sasson and Ronny Greenberg said. “This means that an attacker could not only see all the files in the storage account, but they could also delete and overwrite existing files.”

In response to the findings, Microsoft has said its investigation found no evidence of unauthorized exposure of customer data and that “no other internal services were compromised due to this issue.” It also emphasized that customers do not need to take any action on their part.
The Windows makers further noted that they revoked the SAS token and blocked all external access to the storage account. The issue was resolved two after responsible disclosure.

To mitigate such risks going forward, the company has expanded its secret scanning service to include any SAS token that may have overly permissive expirations or privileges. It said it also identified a bug in its scanning system that flagged the specific SAS URL in the repository as a false positive.
“Due to the lack of security and governance of Account SAS tokens, they should be considered as sensitive as the account key itself,” the researchers said. “Therefore, it is strongly recommended to avoid using Account SAS for external sharing. Errors in creating tokens can easily go unnoticed and expose sensitive data.”
Identity is the new endpoint: Mastering SaaS security in the modern age
Dive deep into the future of SaaS security with Maor Bin, CEO of Adaptive Shield. Discover why identity is the new endpoint. Secure your place now.
This is not the first time that misconfigured Azure storage accounts have come to light. In July 2022, JUMPSEC Labs highlighted a scenario where a threat actor could take advantage of such accounts to gain access to an on-premise business environment.
The development is the latest security blunder at Microsoft and comes nearly two weeks after the company revealed that hackers based in China were able to infiltrate the company’s systems and steal a highly sensitive signing key by compromising an engineer’s corporate account and likely accessing a crash dump of the consumer signature system.
“AI opens up enormous potential for technology companies. But as data scientists and engineers race to bring new AI solutions to production, the vast amounts of data they require handle additional security checks and safeguards,” said Wiz CTO and co-founder Ami Luttwak in a declaration.
“This new technology requires large data sets to train on. With many development teams needing to manipulate huge amounts of data, share it with their colleagues, or collaborate on public open source projects, cases like Microsoft’s are becoming increasingly difficult to monitor and avoid. “