[meta-freescale] imx6 silent memory corruption

Nikolay Dimitrov picmaster at mail.bg
Fri Jan 23 13:11:28 PST 2015


Hi Fabio,

On 01/23/2015 12:25 AM, Fabio Estevam wrote:
> On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster at mail.bg> wrote:
>
>> I will appreciate if you can share ideas what could be wrong with this
>> setup, and also I'll be happy to hear from you suggestions for similar
>> simple tests for system reliability.
>
> Maybe you could try to run the 'memtester' utility and see it how your
> board behaves.

Thanks for the idea. I ran the tool and it also reports errors, but
this happens rarely (just like the hash test) and I still looking for
how to easily reproduce the issue. Here's an example of memory error:


# memtester 64M 100
memtester version 4.1.3 (32-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 64MB (67108864 bytes)
got  64MB (67108864 bytes), trying mlock ...locked.
Loop 1/100:
   Stuck Address       : ok
   Random Value        : ok
FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
   Compare XOR         :   Compare SUB         : ok
   Compare MUL         : ok
   Compare DIV         : ok
   Compare OR          : ok
   Compare AND         : ok
   Sequential Increment: ok
   Solid Bits          : ok
   Block Sequential    : ok
   Checkerboard        : ok
   Bit Spread          : ok
   Bit Flip            : ok
   Walking Ones        : ok
   Walking Zeroes      : ok


Memtester can run for hours without finding an issue, and sometimes it
runs for several minutes and reports a memory error.

Found another tool, stresstestapp (http://stressapptest.googlecode.com
/svn/trunk/) which again seems to trigger the issue. Here's again an 
example of memory error:


# ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64 
-s 300
Stats: SAT revision 1.0.7_autoconf, 32 bit binary
Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open 
source release
Log: 1 nodes, 2 cpus.
Log: Defaulting to 2 copy threads
Log: Flooring memory allocation to multiple of 4: 64MB
Log: Prefer plain malloc memory allocation.
Log: Using mmap() allocation at 0x72430000.
Stats: Starting SAT, 64M, 300 seconds
Log: region number 1 exceeds region count 1
Log: Region mask: 0x1
Log: Seconds remaining: 240
Log: Seconds remaining: 180
Report Error: miscompare : DIMM Unknown : 1 : 134s
Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM 
Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a 
expected:0xaaaaaaaaaaaaaaaa
Report Error: miscompare : DIMM Unknown : 1 : 136s
Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM 
Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe 
expected:0xffffffbfffffffbf
Log: Seconds remaining: 120
Log: Seconds remaining: 60
Report Error: miscompare : DIMM Unknown : 1 : 266s
Hardware Error: miscompare on CPU 0(0x1) at 0x74b979d0(0x358ae9d0:DIMM 
Unknown): read:0x0000001000000000, reread:0x0000001000000000 
expected:0x0000001000000010
Report Error: miscompare : DIMM Unknown : 1 : 274s
Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM 
Unknown): read:0x0000001000000000, reread:0x0000001000000000 
expected:0x0000001000000010
Log: Thread 1 found 3 hardware incidents
Log: Thread 2 found 1 hardware incidents
Stats: Found 4 hardware incidents
Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware 
incidents, 0 errors
Stats: Memory Copy: 256346.00M at 854.46MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: FAIL - test discovered HW problems


I plan to run again the FSL DDR stress test to see whether it
detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
was also thinking to try with another SO-DIMM module to see whether
there's any difference.

Thanks for the ideas so far. This is a major problem for me so I need
to resolve it before doing anything else on this board.

Kind regards,
Nikolay


More information about the meta-freescale mailing list