Machine Learning

Microsoft data linking

Over 2010–13, I was one of two researchers and a small handful of developers, building a production system for data integration—an application of machine learning in databases that leveraged our research at Microsoft e.g., [VLDB’12]. The system shipped multiple times internally (resulting in 4x ShipIt! awards for sustained product transfer). Notable applications were to the Bing Search engine across multiple verticals, and the Xbox game console. After the 2011/12 refresh, in which our data integration was a key contribution from Research, Xbox revenue increased by several $100m (due to increased sales of consoles and Xbox Live subscriptions). Within Microsoft Research, this impact was attributed to our small team. In Bing’s social vertical, our system matched over 1b records daily. I continue to work on data integration at Melbourne.

Predicting liver transplantation failure

Through 2016 my group with colleague Bailey collaborated with the Austin Hospital’s transplantation unit, on predicting outcomes (graft failure) of liver transplantation for Australian demographics. With machine learning-based approaches, PhD student Yamuna Kankanige could improve by over 20% the predictive accuracy of the Donor Risk Index [Transplantation’17]—a risk score widely used by Australian surgeons today, in planning transplants and follow-up interventions.

Media coverage: 9news, heraldsun

Privacy & Security

30yr Medicare/PBS dataset and the Re-identification Criminalisation Bill

With colleagues Teague and Culnane, I helped uncover one of the largest privacy breaches in Australian history 2016–17. Federal health and human services in mid-2016 released an open dataset of 30 years of Medicare and Pharmaceutical Benefits Schemes transaction records, for 10% of the Australian population. The intention was to drive health economics research, for evidence-based policy development. Unfortunately minimal privacy protections were in place, while the data reported sensitive treatments e.g., for AIDS, late-term abortions, etc. Initially we completely reidentified doctors, due to improper hashing of their IDs. As a result the dataset was taken offline and a public statement released by the Department. It could not be recalled. A year later we announced we had reidentified patients such as well-known figures in Australian sport and politics.

The day after Medicare’s retraction, the Attorney General published a plan to legislate against reidentification of Commonwealth datasets. In the months to come the Reidentification Criminal Offence Bill (an amendment to the Privacy Act 1988) was introduced to Parliament criminalising the act of reidentification, unless with prior permission. The bill, if passed, would be retroactively applied and reverse the burden of proof on accused. While stifling security experts and journalists responsibly disclosing existing privacy breaches to the government, the bill would not prevent private corporations or foreign entities outside Australian jurisdiction from misusing Commonwealth data. Of 15 submissions to the ensuing Parliamentary Inquiry examining the appropriateness of the bill, 14 were against including the Law Council of Australia, Australian Bankers’ Association, and EFF. Our submission to the inquiry achieved significant impact, being directly quoted 9 times in the Senate Committee’s final report. We wrote an Op-Ed in the Sydney Morning Herald clearly explaining why criminalising reidentification would do more harm than good.

Media coverage (2016): zdnet (again), The Register, itnews (again), ABS news, The Guardian, The Age, CSO, HuffPo, Canberra Times, Crickey, ComputerWorld, Gizmodo, Digital Rights Watch, The Saturday Paper
Media coverage (2017 exceeding 1m views): ABC, Sydney Morning Herald, IT News, ZdNet, The Register, SBS News, Business Insider, News.com.au, Daily Telegraph, Brisbane Times, Computer World, LifeHacker, BoingBoing, Northern Star, BuzzFeed

Technical privacy assessments: ABS, ONS, Opal

Also with colleagues Culnane and Teague since 2016, I have contributed towards several technical privacy assessments of government data initiatives. Contracted by the Australian Bureau of Statistics (ABS), we have analysed the privacy of several options for name encoding for private record linkage—as might be used for Australian Census data for example. For Transport for NSW, we have performed a technical privacy assessment of a Data61-processed dataset of Opal transport card bus, train, ferry touch ons/offs again under contract. The data has subsequently been published. We have also discovered vulnerabilities in the hashing methodology published by the UK Office of National Statistics in a third privacy assessment (explained here). Common themes to this work are reflected in our 2018 report for the Office of the Victorian Information Commissioner.

Media coverage (2018): Mandarin

Promoting privacy through cheating at Kaggle

In 2011 with Narayanan (now Princeton) and Shi (now Cornell), I helped demonstrate the power of privacy attacks to Kaggle (a $16m Series A, Google acquired platform for crowdsourcing machine learning) [IJCNN’11]. After determining the source of an anonymised social network dataset, intended for use in a link prediction contest, we downloaded and linked it to the competition test set. Normally a linkage attack would end there, having re-identified users. We used it to look up correct test answers and win the competition by ‘cheating’. No privacy breach resulted and contestants remained able to compete. However the result raised awareness for Kaggle, to the stark reality of privacy attacks. Team member Narayanan subsequently consulted on the privacy of the $3m Heritage Health Prize dataset.

Side-channel attacks on Firefox

With a Berkeley group led by Dawn Song [report], I helped improve the security of Mozilla’s open-source development processes. While open-source projects tend to improve system security through the principle of ‘many eyes’, Mozilla was publishing security-related commits to the public Firefox web browser source repository, often a month before those commits would be automatically pushed to users. We trained a learning-based ranker to predict which commits were more likely security-related. An attacker could then easily sift through a few commits by hand to find zero-day exploits, on average a month prior to patching. As a result of our work Mozilla made security-related commits private until they were published as patches.

Funding & Awards

Funding

Since arriving at the University of Melbourne Oct 2013, I have been awarded competitive funding (Cat 1–4) of $3.09m total, $1.72m as lead-CI, $1.19m on a per-CI basis. Funding includes:

  • 2018–2019 $50k: Australia Bureau of Statistics Research Contract, Disclosure Risk Analysis, Chris Culnane, Benjamin Rubinstein.
  • 2018–2019 $153k: U.S. Army Research Office Research Grant, Towards designing complex networks resilient to stealthy attack and cascading failure, Antoinette Tordesillas, Benjamin Rubinstein, James Bailey, Howard Bondell.
  • 2018–2019 $24k: Mondo Power AMSI internship program, Anomaly detection in time series energy consumption data, Benjamin Rubinstein, Leyla Roohi.
  • 2018 $31k: Australia Bureau of Statistics Research Contract, Scaling up Bayesian record linkage, Benjamin Rubinstein, Neil Marchant.
  • 2018–2019 $77k: Oceania Cyber Security Centre Seed Grant, Detection of Infected Internet-of-Thing Devices to Prevent Distributed Denial of Service Attacks, Sarah Erfani et al.
  • 2017–2019 $705k: Defence Science & Technology Group and Data61/CSIRO Next Gen Tech Fund CRP, Adversarial Machine Learning for Cyber, Benjamin Rubinstein et al.
  • 2017–2021 $970k: Department of Education and Training Academic Centre for Cyber Security Excellence (ACCSE), Chris Leckie et al.
  • 2017–2018 $93k: Defence Science & Technology Group Research Contract, Tactical Security and Health in Multi-Modal Sensor Control and Management, Iman Shames, Benjamin Rubinstein, Farhad Farokhii.
  • 2017–2018 $24k: Australia Bureau of Statistics AMSI internship program, Evaluating feasibility of Bayesian entity resolution, Benjamin Rubinstein, Neil Marchant.
  • 2017 $168k: Australian Bureau of Statistics Research Contract, Design of securely encrypted (anonymised) linkage keys, Benjamin Rubinstein, Chris Culnane, Vanessa Teague.
  • 2017 $30k: Transport for NSW Research Contract, Analysis of privacy protections in Transport for NSW Opal data, Benjamin Rubinstein, Chris Culnane, Vanessa Teague.
  • 2017 $35k: Office of the Commissioner for Privacy and Data Protection Research Contract, Implications of de-identification of personal information and impact of de-identification on the Privacy & Data Protection Act (Vic) 2014, Vanessa Teague, Chris Culnane, Benjamin Rubinstein.
  • 2016–2018 $370k: Australian Research Council Discovery Early Career Researcher Award (DECRA), Secure and Private Machine Learning, Benjamin Rubinstein.
  • 2016–2018 $85k: University of Melbourne DECRA Establishment Grant, Secure and Private Machine Learning, Benjamin Rubinstein.
  • 2015 $48k: Melbourne Networked Society Institute Seed Grant, Active Defence, Benjamin Rubinstein et al. Pursuit article
  • 2015–2016 $128k: FLI Project Grant, Security Evaluation of Machine Learning Systems, Benjamin Rubinstein. Funds backed by Elon Musk, media: vice news.com.au pursuit
  • 2015 $20k: Microsoft Research Azure Machine Learning Award, Big data preparation, Benjamin Rubinstein. In kind support
  • 2015–2017 $216k: Australian Research Council Discovery Project, Benjamin Rubinstein. First early-career sole-CI in FOR08, nationally, for 3 years
  • 2014 $39k: University of Melbourne ECR Grant, Adversarial Machine Learning, Benjamin Rubinstein.
  • 2014 $5k: Amazon AWS Machine Learning Grant, Adversarial Machine Learning, Benjamin Rubinstein.

Awards & Honours

  • Best Reviewer Award (2018), Thirty-second Conference on Neural Information Processing Systems (NeurIPS formerly NIPS)
  • WiE Best Postgrad Paper Prize (2017), IEEE Australia Council for PhD student Maryam Fanaeepour’s joint work
  • Victorian Young Tall Poppy Science Award (2016), Australian Institute of Policy & Science
  • Microsoft Azure ML Award (2015), Microsoft Research
  • Excellence in Research Award (2014), Dept CIS, University of Melbourne
  • Gold Star Award (2011), Microsoft Research, top employee accolade
  • ShipIt! Awards (2010–12, four times), Microsoft, each for sustained product transfer
  • Yahoo! Key Scientific Challenge Prize (2009), Adversarial Machine Learning
  • Siebel Scholars Fellowship (2009), Siebel Foundation, final year graduate fellowship
  • Best Poster Award (2008), 11th Int. Symp. Recent Advances in Intrusion Detection (RAID’08)
  • UC Regents University Fellowship (2004–05), UC Berkeley, first year graduate fellowship
  • IEEE Computer Society Larson Best Paper Prize (2002), ugrad papers worldwide for

Service

Speaking engagements

Decadal plan

As member of the Australian Academy of Science’s National Committee for Information and Communication Sciences (2015–2020) contributing to the development of a Decadal plan for ICT.

Program committee membership