[meta-freescale] i.mx6sx linux-fslc sema4 bug

Petr Kubizňák kubiznak.petr at elnico.cz
Wed Oct 11 07:45:40 PDT 2017


This is an official report of the issue originally reported at 
https://community.nxp.com/thread/461521 .


    How to reproduce

I hit on a problem in linux-fslc 4.1.38 (default in yocto-morty), 
available at GitHub - Freescale/linux-fslc: Linux kernel source tree 
<https://community.nxp.com/external-link.jspa?url=https%3A%2F%2Fgithub.com%2FFreescale%2Flinux-fslc%2Ftree%2F4.1-2.0.x-imx> 
. It is related to the sema4 mutex locking mechanism 
(drivers/char/imx_amp/imx_sema4.c). The problem only arises when the M4 
core is running and its RDC domain is set. I'm running a trivial 
FreeRTOS program based on the hello_world demo from the NXP's official 
distribution 
(https://www.nxp.com/webapp/Download?colCode=FreeRTOS_MX6SX_1.0.1_WIN&appType=license&Parent_nodeId=142289606182472763482… 
<https://community.nxp.com/external-link.jspa?url=https%3A%2F%2Fwww.nxp.com%2Fwebapp%2FDownload%3FcolCode%3DFreeRTOS_MX6SX_1.0.1_WIN%26appType%3Dlicense%26Parent_nodeId%3D1422896061824727634827%26Parent_pageType%3Dproduct> 
). The program simply sets the RDC domain and loops forever:

|RDC_SetDomainID(RDC, rdcMdaM4, BOARD_DOMAIN_ID, false); while (1) ;|

With Yocto (Morty), I built the qt4e-demo-image (fslc-framebuffer 
distro). When I boot the image, qtdemoE demo starts. The problem can be 
reproduced with an intensive I2C communication. In my case, I touch on 
the screen wildly, generating the touch events. My interrupt-driven 
touchscreen driver calls i2c_smbus_read_byte_data() to read the data.


    What it does

After a while, the following message starts printing in the terminal 
infinitely:

drivers/char/imx_amp/imx_sema4.c -> _imx_sema4_mutex_lock 137 already locked, wait! num 6 val 1

The FreeRTOS on the M4 core continues without problem. If I stop the 
debugger and check the SEMA4_Gate06 register, its value is set to 1, 
indicating it is locked by the A9 core.


    Workaround

After spending hours on this problem, I formed a testing patch (see 
attached). In the _imx_sema4_mutex_lock() function, it disables the 
error log for the first 5 occurences. As a result, instead of being 
imprisoned in the infinite error message loop, I just get a single 
message and everything seems to work fine:

286 Trying to unlock an unlock mutex.

When this repeats 5-times (which might take a while), the system enters 
the infinite message loop. Including my added prints, the output looks 
as follows:

286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
drivers/char/imx_amp/imx_sema4.c -> _imx_sema4_mutex_lock 145 already locked, wait! num 6 val 1. 6
CPU: 0 PID: 66 Comm: irq/203-crtouch Tainted: G           O    4.1.38-fslc-01914-gb3f76d9ae87e-dirty #12
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800184e0>] (unwind_backtrace) from [<80013cfc>] (show_stack+0x20/0x24)
[<80013cfc>] (show_stack) from [<8091b2ec>] (dump_stack+0x80/0x94)
[<8091b2ec>] (dump_stack) from [<80406be8>] (_imx_sema4_mutex_lock+0x11c/0x1c4)
[<80406be8>] (_imx_sema4_mutex_lock) from [<80406d58>] (imx_sema4_mutex_lock+0xa4/0x128)
[<80406d58>] (imx_sema4_mutex_lock) from [<80025738>] (clk_gate2_do_shared_clks+0x78/0xe4)
[<80025738>] (clk_gate2_do_shared_clks) from [<800258b8>] (clk_gate2_disable+0x58/0x7c)
[<800258b8>] (clk_gate2_disable) from [<806b6838>] (clk_core_disable+0x64/0x1e4)
[<806b6838>] (clk_core_disable) from [<806b7630>] (clk_disable+0x34/0x40)
[<806b7630>] (clk_disable) from [<805d2eec>] (i2c_imx_xfer+0x194/0xe28)
[<805d2eec>] (i2c_imx_xfer) from [<805ce5a0>] (__i2c_transfer+0x160/0x678)
[<805ce5a0>] (__i2c_transfer) from [<805ceb2c>] (i2c_transfer+0x74/0xa0)
[<805ceb2c>] (i2c_transfer) from [<805cf1a4>] (i2c_smbus_xfer+0x590/0x9d8)
[<805cf1a4>] (i2c_smbus_xfer) from [<805cf7ec>] (i2c_smbus_read_byte_data+0x3c/0x4c)
[<805cf7ec>] (i2c_smbus_read_byte_data) from [<805c07d0>] (crtouch_ts_interrupt+0x90/0x14c)
[<805c07d0>] (crtouch_ts_interrupt) from [<8007e824>] (irq_thread_fn+0x2c/0x50)
[<8007e824>] (irq_thread_fn) from [<8007eb80>] (irq_thread+0x13c/0x188)
[<8007eb80>] (irq_thread) from [<800576b4>] (kthread+0xec/0x104)
[<800576b4>] (kthread) from [<8000fca8>] (ret_from_fork+0x14/0x2c)
l : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
lo: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
k : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
ko: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
u : 97282, 8852a240, clk_gate2_do_shared_clks+0xa4/0xe4
uo: 97180, 8852a240, clk_pllv3_do_shared_clks+0xa4/0xe4
drivers/char/imx_amp/imx_sema4.c -> _imx_sema4_mutex_lock 145 already locked, wait! num 6 val 1. 7
CPU: 0 PID: 66 Comm: irq/203-crtouch Tainted: G           O    4.1.38-fslc-01914-gb3f76d9ae87e-dirty #12
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800184e0>] (unwind_backtrace) from [<80013cfc>] (show_stack+0x20/0x24)
[<80013cfc>] (show_stack) from [<8091b2ec>] (dump_stack+0x80/0x94)
[<8091b2ec>] (dump_stack) from [<80406be8>] (_imx_sema4_mutex_lock+0x11c/0x1c4)
[<80406be8>] (_imx_sema4_mutex_lock) from [<80406d88>] (imx_sema4_mutex_lock+0xd4/0x128)
[<80406d88>] (imx_sema4_mutex_lock) from [<80025738>] (clk_gate2_do_shared_clks+0x78/0xe4)
[<80025738>] (clk_gate2_do_shared_clks) from [<800258b8>] (clk_gate2_disable+0x58/0x7c)
[<800258b8>] (clk_gate2_disable) from [<806b6838>] (clk_core_disable+0x64/0x1e4)
[<806b6838>] (clk_core_disable) from [<806b7630>] (clk_disable+0x34/0x40)
[<806b7630>] (clk_disable) from [<805d2eec>] (i2c_imx_xfer+0x194/0xe28)
[<805d2eec>] (i2c_imx_xfer) from [<805ce5a0>] (__i2c_transfer+0x160/0x678)
[<805ce5a0>] (__i2c_transfer) from [<805ceb2c>] (i2c_transfer+0x74/0xa0)
[<805ceb2c>] (i2c_transfer) from [<805cf1a4>] (i2c_smbus_xfer+0x590/0x9d8)
[<805cf1a4>] (i2c_smbus_xfer) from [<805cf7ec>] (i2c_smbus_read_byte_data+0x3c/0x4c)
[<805cf7ec>] (i2c_smbus_read_byte_data) from [<805c07d0>] (crtouch_ts_interrupt+0x90/0x14c)
[<805c07d0>] (crtouch_ts_interrupt) from [<8007e824>] (irq_thread_fn+0x2c/0x50)
[<8007e824>] (irq_thread_fn) from [<8007eb80>] (irq_thread+0x13c/0x188)
[<8007eb80>] (irq_thread) from [<800576b4>] (kthread+0xec/0x104)
[<800576b4>] (kthread) from [<8000fca8>] (ret_from_fork+0x14/0x2c)
l : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
lo: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
k : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
ko: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
u : 97282, 8852a240, clk_gate2_do_shared_clks+0xa4/0xe4
uo: 97180, 8852a240, clk_pllv3_do_shared_clks+0xa4/0xe4
... (repeats infinitely) ...


    The problem

The output shows the problem is in calling the imx_sema4_mutex_lock() 
function from two places at once (clk_gate2_do_shared_clks() and 
clk_pllv3_do_shared_clks()). One comes from the touchscreen interrupt 
handler, second comes from some worker thread, but I don't know more 
details about it.

It is obvious that printing anything to the console from an interrupt 
handler is not okay. So I suggest to disable all debug/error prints in 
the /imx_sema4.c/ file. Then the infinite message loop would be avoided.

The question is, *what is the root cause*. Is the debug print really the 
problem (i.e. it is a valid situation that mutex_ptr->gate_val == 
SEMA4_A9_LOCK), or does it just expose a problem hidden somewhere else 
(i.e. the debug print should actually never appear).

As the aforementioned output shows, the condition (mutex_ptr->gate_val 
!= SEMA4_A9_LOCK) in imx_sema4_mutex_unlock() holds true when the 
problematic situation arises. I guess it is because the gate_val is 
already 0, as the unlock function has been called twice, which 
corresponds to the two locking calls. That suggests the locking 
mechanism does not work correctly.

Is my reasoning correct? How to properly fix it?


    Workaround 2

Recently I found that the /evbug/ kernel module might be incident with 
the issue. When I rmmod evbug, I never see the issue appear. When I 
insmod evbug again, the issue reappears. I don't think it's the root 
cause, though. Is it?


Petr

Petr Kubiznak
Embedded SW Engineer
ELNICO s.r.o.
Heydukova 2923, 544 01 Dvur Kralove n.L., Czech Republic
T: +420 499 321 019 | M: +420 724 053 668
www.elnico.cz <http://www.elnico.cz/> | www.sqm4.com 
<http://www.sqm4.com> | kubiznak.petr at elnico.cz 
<mailto:kubiznak.petr at elnico.cz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.yoctoproject.org/pipermail/meta-freescale/attachments/20171011/d5dfdf41/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imx_sema4.patch
Type: text/x-patch
Size: 3124 bytes
Desc: not available
URL: <http://lists.yoctoproject.org/pipermail/meta-freescale/attachments/20171011/d5dfdf41/attachment.bin>


More information about the meta-freescale mailing list