University of Electro-Communications e-Bulletin: Speech Signal Processing Based on Shallow Neural Networks

TOKYO, Dec. 26, 2021 /PRNewswire/ — University of Electro-Communications publishes the December 2021 issue of UEC e-Bulletin

December 2021 issue of UEC e-Bulletin

The December 2021 issue of the UEC e-Bulletin includes a video profile of UEC Associate Professor Toru Nakashika describing his recent research on “Speech Signal Processing Based on Shallow Neural Networks”.

The Research Highlights are ‘Frequency analysis helps to understand sleep disorder’, Keiki Takadama; and ‘Educational measurement/Modelling performance assessment’, Masaki Uto.

The Topics column is an interview with Eriko Watanabe, Associate Professor, Department of Engineering Science, offering insights into ‘Fascination with digital holograms and their applications for imaging through semi-opaque materials’.

Research Highlights

Sleep science: Frequency analysis helps to understand sleep disorder

Sleep apnea syndrome (SAS) is a sleep disorder characterized by the occurrence of pauses in breathing (apnea) during sleep. Such pauses can typically last for more than 10 seconds and are often followed by loud snoring. The brain interprets each breathing pause as danger — because of the decrease in oxygen supply — and sleep becomes shallow. As a result, a person suffering from SAS builds up a sleep debt, which may in turn lead to mental health issues like depression or dementia. In order to avoid medical complications, early detection of SAS is crucial. So-called non-contact detection methods are based on monitoring chest motion, e.g. by means of a sensor attached to the mattress sensor the person is sleeping on; from the recorded bio-vibration data, breathing frequencies and amplitudes can be derived. This type of method is not always effective. For example, when a person’s breathing is ‘forced’ (breathing accompanied by thoracic and abdomen movement, and in fact also a symptom of SAS), sleep apnea is difficult to detect.

The researchers analysed bio-vibration data recorded from 9 SAS patients and 9 healthy individuals, obtained by means of a mattress sensor. Rather than looking only at respiration (between 0.1 Hz and 0.2 Hz) and heartbeat (between 0.6 Hz and 1.5 Hz) frequencies, they considered frequencies up to 8 Hz, and looked at the distribution — the spectrum — of frequencies. When comparing frequency spectra, Nakari and Takadama noticed a slight increase in frequency density around 3 Hz for the SAS patients. On a logarithmic plot of the frequency spectrum, this increase manifests itself as a convex shape. Based on this observation, the researchers defined a quantity called the degree of convexity of the logarithmic spectrum (DCLS).

Remarkably, the average DCLS value for the SAS patients (≈ 99 ± 10) is completely separate from the average value for the healthy subjects (≈ 48 ± 7). Therefore, the DCLS value has the potential to be used as an indicator for SAS — obtained just by sleeping on a mattress sensor.

Further analysis showed that the increased frequency density around 3 Hz corresponds to accumulated density in the so-called WAKE stage (the first of six levels used for characterizing ‘sleep deepness’). Therefore, it is likely that the WAKE stage is different for SAS patients and people not suffering from sleep apnea. Even more, the researchers argue that SAS subjects generate 3 Hz waves during WAKE phases, and believe that this may actually be a hitherto unknown symptom of SAS, apart from the apnea itself. However, as Nakari and Takadama point out, future work “should clarify the phenomenon around 3 Hz”.


Iko Nakari and Keiki Takadama, Sleep Apnea Syndrome Detection Based on Degree of Convexity of Logarithmic Spectrum Calculated from Overnight Bio-vibration Data of Mattress Sensor, pp. 2274–2277, (2021).

The 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC2021) (2021).


Educational measurement: Modelling performance assessment

Performance assessment of a practical task carried out by an examinee is typically done by human raters awarding scores for different parts of the task. Often, a so-called scoring rubric is used for this purpose, listing the various parts and descriptions of the performance scores associated with them. There are some inherent shortcomings to this procedure, however, including the characteristics of the rubric’s evaluation items and the raters’ behaviour — one rater may score differently than another. Now, Masaki Uto from the University of Electro-Communications has developed a new model that takes into account the specifics of a rubric’s evaluation items and the raters.

The approach followed by Uto relies on models developed in a theoretical framework known as item response theory. It is based on a formula giving the probability Pijkr that examinee j gets score k for evaluation item i by rater r. The formula typically contains parameters such as the difficulty (βi) for the evalution item, the latent ability of the examinee (θj) and the severity of the rater (βr). The idea is then that, by fitting the formula to an existing dataset with known score outcomes, good values of the parameters (like βi, θj and βr) can be obtained. Yet, this description is almost always too simplistic to result in good results, however.

One improvement lies in incorporating the notion of ability dimensions — an abstract representation of an examinee having different ability ‘spheres’. Uto’s model combines ability dimensions with rater characteristics, which signifies a step forward in item response theory modelling.


Apart from providing a more realistic description of performance assessment with a rubric and raters, the model can also help to check the quality of the rubric’s evaluation items, as well as providing insights into what exactly each ability dimension measures.

Uto tested the probability formula by first simulating a large number of data sets, with randomly generated parameters. Then, the data sets were fitted to the formula, resulting in estimated parameters. Good agreement between the true and the fitted parameters was obtained, showing that the model works well. Moreover, specific simulations showed that the inclusion of rater characteristics led to improved examinee ability accuracy.

The model was also tested in actual data experiments, with 134 Japanese university students performing an essay-writing task requiring no preliminary knowledge. One conclusion was that, for this case, a two-dimensionality assumption worked better than a one-dimensional ability. A further finding was that the inclusion of rater characteristics indeed improved model fitting.

Uto plans to further test the model’s effectiveness using various and more massive datasets, and to, quoting the researcher, “extend the proposed model to four-way data consisting of examinees × raters × evaluation items × performance tasks because practical tests often include several tasks.”


Masaki Uto, A multidimensional generalized many–facet Rasch model for rubric-based performance assessment, Behaviormetrika 48, 425–457 (2021).


DOI: 10.1007/s41237-021-00144-w

Researcher Video Profiles

Toru Nakashika, Associate Professor Department of Computer and Network Engineering Graduate School of Informatics and Engineering, University of Electro-Communications, Tokyo

Speech Signal Processing Based on Shallow Neural Networks

In this video feature Toru Nakashika describes his group’s research on speech signal processing using shallow neural networks.

It is widely known that deep learning (DL) is used in audio signal processing. In this approach, many studies use DL by increasing the number of layers of neural networks and parameters in the “dark cloud” to improve expressiveness and accuracy.

However, such models have problems such as high calculation costs and the need for huge amounts of data. Furthermore, DL is often called a black box, and it is difficult to interpret what is being done internally. Therefore, it is difficult to come up with ideas for improvements.

“The goal of my research is to produce the same level of accuracy in speech recognition and synthesis as in deep learning but by using interpretable and shallow models based on appropriately expressing the structure of speech data,” explains Nakashika. “That is by using wisdom instead of computational resources, we aim to reduce computational costs and achieve more practical speech recognition and speech synthesis.”

Nakashika and his colleagues use shallow models, including the Boltzmann machine model—an example of a shallow and interpretable model. The use of a Boltzmann machine enables the expression of an arbitrary probability distribution by freely designing so-called called energy functions, and audio data structures can be appropriately expressed using this model.

Since this Boltzmann machine is a shallow model, it has the advantage that both calculation costs and the amount of data required for learning reduced can be significantly reduced.

Some recent results obtained by Nakashika include voice identity conversion—a technology that processes voice and only converts a person’s personality without changing the contents of the utterance. “I have proposed a model called the speaker-cluster-adaptive restricted Boltzmann machine,” says Nakashika. “This is an extension of the Boltzmann machine, and conversion is possible with only about one second of data.”

Nakashika has also proposed the so-called complex-valued restricted Boltzmann machine model that directly expresses complex numbers. Sound is often expressed in a complex spectrum, but since it is known that amplitude is better recognized by humans than phase, it is possible to omit the phase and only use the amplitude spectrum. “I think that it would be more expressive if there was a model that could directly express the phase, and the model that can directly express the complex spectrum of the voice is the complex-restricted Boltzmann machine mentioned earlier,” says Nakashika. “We showed that this makes it possible to synthesize speech with higher accuracy than the conventional VOice enCODER.”

Plans include the application of the Boltzmann machine to speech synthesis and voice quality conversion in other fields of speech signal processing, such as speech recognition and sound source separation. “I would like to encourage more promote more research on shallow neural networks.”

References and further information

Toru Nakashika, and Kohei Yatabe, “Gamma Boltzmann Machine for Audio Modeling,”IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol.29, pp.2591-2605, July 2021.


More references


Fascination with digital holograms and their applications for imaging through semi-opaque materials

Eriko Watanabe, Associate Professor, Department of Engineering Science, UEC Tokyo.

“My interest in light and optics was triggered when I saw an exhibition on holography at an event on campus during my undergraduate days at university,” says Eriko Watanabe, an associate professor at the Department of Engineering Science, UEC Tokyo. “I was intrigued by the amazing three dimensional optical structures that could be produced by the interference of light waves. This fascination with holograms is the basis for my current research.”

Recent research being conducted by members of the Watanabe Group includes digital holography imaging of objects hidden by media such as scatter plates and biological tissues. “The ultimate goal is to develop technology for the non-invasive imaging of living cells inside biological tissues,” explains Watanabe. “We expect our research will play an important role in clarifying biological mechanisms governing human health on the cellular level.” Other potential applications of this technology include imaging through fog and air turbulence, where the latter is important for land-based astronomy where movements of the air can adversely affect astrophotography.

Specific scientific issues to resolve to achieve these goals are (1) elimination of temporally fluctuating spatial noise due to complex fluctuations and scatterers to capture images behind obtrusive objects, and (2) development of microscopic imaging technology for visualizing below living skin.

One solution proposed by the Watanabe Group is using deep neural networks to suppress temporally fluctuating spatial noise and applying optical correlation imaging. “Our imaging method combines deep learning with optical correlation imaging that accelerates ordinary single pixel imaging by the use of optical computing,” explains Watanabe. “Furthermore, we are imaging behind scattering media by phase shift digital holography using near-point light sources with planar waveguides. Using a near-point light source eliminates fluctuations with common optical path digital holography and planar waveguides take us closer towards ‘needle-type’ probe structures.”

Read more

Further information

University of Electro-Communications
1-5-1 Chofugaoka, Chofu, Tokyo 182-8585

About the University of Electro-Communications

The University of Electro-Communications (UEC) in Tokyo is a small, luminous university at the forefront of pure and applied sciences, engineering, and technology research. Its roots go back to the Technical Institute for Wireless Commutations, which was established in 1918 by the Wireless Association to train so-called wireless engineers in maritime communications in response to the Titanic disaster in 1912. In 1949, the UEC was established as a national university by the Japanese Ministry of Education and moved in 1957 from Meguro to its current Chofu campus Tokyo.

With approximately 4,000 students and 350 faculty members, UEC is regarded as a small university, but with expertise in wireless communications, laser science, robotics, informatics, and material science, to name just a few areas of research.

The UEC was selected for the Ministry of Education, Culture, Sports, Science and Technology (MEXT) Program for Promoting the Enhancement of Research Universities as a result of its strengths in three main areas: optics and photonics research, where we are number one for the number of joint publications with foreign researchers; wireless communications, which reflects our roots; and materials-based research, particularly on fuel cells.


Cision View original content:

SOURCE University of Electro-Communications

error: Content is protected !!