Chapter 3: Small data and few-shot machine learning

“ …people are individual people and not an ‘average’ (footnote 114) .”

Gary Marsden, Andrew Maunder and Munier Parker

Small data analysis is the use of tools and techniques for data analysis in settings where there is only limited amounts of data and information. While disabled people are a large group at 16% of the global population (footnote 115) , the wide variety of disabilities and how they are experienced means in practice datasets for specific disabilities are typically small. This contrasts with dominant approaches in technology development which emphasise the importance and use of large datasets, typically referred to as ‘big data’. Small data is an alternative to this paradigm, where advancements in small data methods and techniques could offer opportunities to create new DigAT and enable more inclusive analysis of data.

What is small data and its benefits?

Small data analysis refers to the ability to derive insights and analyse detailed context-specific information from smaller datasets. These approaches have always been important to scientific research with early scientific discoveries, such as early astronomical observations, relying on small numbers of observations. However, since the early 2000s, big data approaches, relying on the ability to search and analyse vast datasets, have become increasingly popular due to advancements in computing power and access to large quantities of data.

Small data techniques can be invaluable in situations where large datasets are simply not available, such as research on rare diseases or creating products for niche markets. Big data approaches often rely on the misleading assumption that bigger datasets lead to more reliable conclusions (footnote 116). In practice, these techniques often fail when confronted with outliers or unique scenarios. For example, AI models used for self-driving cars have failed to recognise backward-propelled wheelchairs despite being trained on wheelchair-representative datasets (footnote 117). Small data techniques can preserve more contextual information and improve reliability when datasets are smaller and contain large variations.

Small data and more personalised approaches can better capture the unique and diverse experiences of individuals. Big data approaches can lead to an overemphasis on the average needs of a population, neglecting those who fall outside the ‘norm’. This can be particularly concerning when big data and statistical averages are used for decision making and policy making. While averages can be useful summaries, they mask important variations and can lead to decisions that prioritise the needs of the majority without adequately addressing the needs of all individuals, particularly disabled people who may have less common needs (footnote 118).

What are the techniques for small data? (footnote 119)

Small data research is currently undertaken across many disciplines meaning there is a range of different methods used. In applying small data methods, three key concepts are used: similarity, transfer and uncertainty.

  • Similarity
    Determining the similarity between different datasets is important when working with small data. Several quantitative methods (footnote 120) have been developed to assess similarity between different datasets, which can help with assessing whether datasets can be combined and whether insights from one group can apply to another. For example, in rare disease research, assessing similarity between different patient groups can help with leveraging evidence from similar cases to improve treatment prediction.
  • Transfer
    Transfer of information is key for small data, when there is a transfer of information between similar datasets or when a small dataset can be enriched with information from other external sources, such as databases or other models. These methods can include techniques for few-shot learning, representation learning and neuro-symbolic AI.
  • Uncertainty
    Uncertainty is particularly important in small data settings due to the limited information available for modelling. Several methods can be used to quantify and estimate uncertainty in model parameters though more work is needed to assess uncertainty from model selection. One important approach for reducing uncertainty is meta-learning, where a model learns across many datasets.

The following examples are some techniques and methods that are useful for small data:

Few-shot learning

Few-shot learning is a machine learning technique for learning a task or category given a small number of examples (footnote 121). Humans are natural small data learners: given a few images of a car, children can generalise the concept and recognise similar objects and few-shot learning techniques aim to apply this idea to machine learning systems. Few-shot learning is an attractive tool for tackling small data challenges as it aims to optimise performance when data is scarce, as is often the case for disabled communities.

Meta-learning

Meta-learning, or ‘learning to learn’, refers to a technique for training machine learning models using knowledge from several (potentially small) datasets. By training a model on several datasets, the aim is for the model to then be more readily adaptable to new tasks with few examples, which is especially useful for small data settings.

Neuro-symbolic AI

Neuro-symbolic AI combines two approaches to AI: neural networks based on data-driven modelling and symbolic AI which builds in explicit knowledge or rules into a system. This combines the strengths of neural networks, which can learn from large amounts of data but are difficult to interpret, with the strengths of symbolic AI techniques, which rely on knowledge and assumptions explicitly coded into a system, increasing explainability and efficiency. This approach can be useful for integrating small data with big data, where small data is explicit knowledge built into a larger neural network based on big data.

What is the potential of small data analytics for supporting disabled people?

Advancements in small data research and techniques could significantly improve analysis of disability data and create better DigAT.

Small data for research and policy

Small data approaches are a necessity when analysing small datasets, such as in rare disease research (diseases afflicting less than one in two thousand people). For example, a clinician may need to assess the right dose of a treatment for a new child patient with a rare genetic condition. Given the small number of previous patients with the condition, small data approaches could be used to match the new patient to the most similar subgroup of patients (eg patients under 10 years old) or draw on relevant information (eg age) to enable better predictions of the right dose (footnote 122). Small data techniques could also be used to combine data from several different studies of individual patients to create a relatively larger dataset to be used for dose predictions (footnote 123).

Analysis of small data can also be useful for evidence-informed policy-making. The use of big data analysis for policy-making prioritises the needs of the average individual, reinforcing the ‘invisibility’ of marginalised groups, such as disabled people, in decisions around spending priorities. Small data approaches could help ensure policy-making is more contextual and inclusive, leading to better outcomes for both individuals and society as a whole (footnote 124). However, there are trade-offs involved since insights from small data will need to be balanced against other data to ensure small datasets are not skewed or biased.

Personalising DigAT

Small data approaches can also help personalise DigAT to better suit disabled peoples’ unique needs. Few-shot learning, meta-learning and neuro-symbolic AI can enable systems to learn from smaller datasets creating opportunities for new adaptable DigATs.

Emerging research uses few-shot learning and meta-learning for automated sign language recognition systems (footnote 125), (footnote 126), personalisation of sound recognition systems used by D/deaf and hard-of-hearing users (footnote 127) and to design Augmentative and Alternative Communication (AAC) systems for people with complex speech and communication needs (footnote 128), (footnote 129).

For example, WESPER is a zero-shot AI tool converting whispers to normal speech, which can be useful for people with hearing loss (footnote 130). FindMyThings, developed by Microsoft, is an AI object recognition tool designed to help people with vision loss find their personal items, which uses few-shot learning to reduce the number of examples required to complete the task with minimal effort from users (footnote 131). Neuro-symbolic AI could be used to personalise devices, such as smartphones, by suggesting optimal accessibility settings on a phone based on data inputted by a disabled user (footnote 132).

Analysing small data is also key to advancements in wearables and remote monitoring devices, such as those used in social care settings to detect falls. Fall detection is challenging due to significant variations in human bodies and how movement is recorded (footnote 133). Small data approaches comparing small and large datasets can be used to personalise these devices by using an individual’s collected data to understand how they normally move and improve accuracy (footnote 134).

What are the limitations of small data approaches?

Small data problems occur in a range of fields and hence, small data methodologies have been developed across many research areas. While this shows the relevance of small data research in many domains, it also means that research may be impeded due to a lack of interdisciplinary communication. This includes a lack of shared language for small data approaches.

One key limitation of small data approaches is the risk of overfitting, where the model learns patterns that are too closely aligned to the training data and fails to generalise to other datasets. While this is also a risk in big data approaches, the limited information available in small datasets means the data may not be diverse enough to cover a wide range of situations making it more likely a model will learn specific patterns that do not generalise (footnote 135). This risk is heightened in cases where certain categories are overrepresented in a dataset leading to biased predictions or when a model relies on historical data, where the underlying pattern could change in the future (footnote 136).

Another limitation to small data approaches is the challenge of validating models trained on small data. To validate a model, there should be no overlap between the dataset used to train the model and the dataset used to test the model. When there is an overlap, a phenomenon known as data leakage, it leads to an overestimation of the model’s accuracy and decreases the ability of a model to generalise to new data – an effect which is amplified for small datasets. External validation, where a model is tested on new similar datasets, is challenging in small data settings where there may be a scarcity of data available for training, let alone validation.

To address these challenges, assessment of similarity between datasets is crucial. Policies facilitating data exchange such as encouraging collaboration and providing data sharing infrastructure for researchers can help with addressing data scarcity (footnote 137). The creation of datasets in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles enables easier comparisons for similarity assessments (footnote 138).

Case study 3: DigAT for travel and tourism

For travel and tourism, DigAT can be used by disabled people to address challenges related to navigation, anxiety and communication. DigAT can improve how disabled travellers locate and interact with relevant information and create more opportunities for accessible experiences while travelling.

Opportunities

One key challenge disabled travellers face is navigation as it can be difficult to know in advance whether a route or location meets their accessibility needs. While mainstream travel search sites allow for filtering of search results for ‘accessibility’, these results are often unreliable due to lack of standardisation (footnote 139) and navigation apps often don’t include accessibility features (footnote 140). Online platforms, such as accessibleGO (footnote 141) and wheelmap (footnote 142), include more specific information about facilities so travellers with mobility issues can accurately check whether a location meets their needs, for example, accessible bathrooms and showers. Companies, such as Ocean 3D, create virtual tours of airports, hotels and bars, which anxious people are able to view on their computer or on virtual reality headsets to explore a route and practice ahead of a trip (footnote 143).

While travelling or staying in novel environments, disabled travellers can face communication issues or barriers to essential information, which may only be communicated in one format such as audio announcements in a noisy environment. Audio-to-text systems can be useful, such as Spoke, a mobile app that integrates with public address systems to convert live announcements and sounds into written text for d/Deaf and hard of hearing people while travelling (footnote 144). Navigation apps can increase the confidence of Blind and partially sighted people when travelling (footnote 145). For example, WayMap uses location technologies that don’t require wi-fi or mobile signal to help users navigate locations such as train stations with audio instructions (footnote 146).

On reaching a destination, DigAT can be used to create more customised accessible experiences for disabled people. Through 3D printing and digital near field communication (NFC) technologies, PictureLive creates audio-tactile interactive experiences of visual information and artefacts for blind and partially sighted people who are often excluded from traditional “sight-seeing” when travelling (footnote 147). Virtual reality headsets can also be used to provide tours of archaeological sites which are often inaccessible for wheelchair users (footnote 148).

Challenges

One challenge for using DigAT for travel is a lack of consistent high-quality data across countries for navigation and about accessibility requirements. Data on specific accessibility requirements, such as sensory accommodations, are often not included in general datasets and there is a lack of standardisation of existing data meaning, for example, ‘wheelchair accessible’ doesn’t guarantee standard measurements. Additionally, it is important that accessibility information is kept up-to-date to not foster a false sense of accessibility. International differences in how disability is defined and measured can also limit disabled travellers’ ability to make well-informed decisions on whether their accessibility needs are met in different locations.

Developing robust DigAT that can be used in a wide range of contexts and locations is hindered by a lack of globally comprehensive datasets. For example, audio-to-text AI systems used to transcribe real-time information when travelling require datasets including audio and text data from multiple languages, which can be expensive to create or access. Inaccuracies can mislead disabled travellers causing frustrating or dangerous situations when a system is unable to recognise announcements or information in different languages. Using DigAT also requires reliable access to electricity and the internet, which often cannot be guaranteed while travelling, particularly in low-resource settings, reinforcing digital exclusion.

Example: Transport for London

Local public transportation authorities, such as Transport for London (TfL), can use and support DigAT to address navigation challenges when travelling. One recent TfL initiative has used Google Street View to visually map London’s busiest stations so wheelchair users can virtually navigate and plan their travel routes (footnote 149). In 2023, TfL also trialled NaviLens, an app that detects special QR codes while travelling to provide voice guidance for Blind and partially sighted people (footnote 150). This is already used in several cities such as Barcelona and New York to provide access to real-time travel information in underground stations and bus stops (footnote 151).

Conclusion

There are significant opportunities to use DigAT for addressing challenges currently encountered by disabled travellers and tourists, such as for navigation and lack of accessible experiences. However, using DigAT in this context requires better collection and standardisation of accessibility data, access to relevant datasets for the creation of globally comprehensive DigAT and support for infrastructure so disabled travellers can reliably use DigAT.