[meta-freescale] imx6 silent memory corruption

Doug Schwanke doug.schwanke at firstviewconsultants.com
Mon Jan 26 06:40:07 PST 2015


> -----Original Message-----
> From: meta-freescale-bounces at yoctoproject.org [mailto:meta-freescale-
> bounces at yoctoproject.org] On Behalf Of Nikolay Dimitrov
> Sent: Friday, January 23, 2015 3:11 PM
> To: Fabio Estevam
> Cc: meta-freescale at yoctoproject.org
> Subject: Re: [meta-freescale] imx6 silent memory corruption
> 
> Hi Fabio,
> 
> On 01/23/2015 12:25 AM, Fabio Estevam wrote:
> > On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster at mail.bg>
> wrote:
> >
> >> I will appreciate if you can share ideas what could be wrong with
> >> this setup, and also I'll be happy to hear from you suggestions for
> >> similar simple tests for system reliability.
> >
> > Maybe you could try to run the 'memtester' utility and see it how your
> > board behaves.
> 
> Thanks for the idea. I ran the tool and it also reports errors, but this happens
> rarely (just like the hash test) and I still looking for how to easily reproduce
> the issue. Here's an example of memory error:
> 
> 
> # memtester 64M 100
> memtester version 4.1.3 (32-bit)
> Copyright (C) 2010 Charles Cazabon.
> Licensed under the GNU General Public License version 2 (only).
> 
> pagesize is 4096
> pagesizemask is 0xfffff000
> want 64MB (67108864 bytes)
> got  64MB (67108864 bytes), trying mlock ...locked.
> Loop 1/100:
>    Stuck Address       : ok
>    Random Value        : ok
> FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
>    Compare XOR         :   Compare SUB         : ok
>    Compare MUL         : ok
>    Compare DIV         : ok
>    Compare OR          : ok
>    Compare AND         : ok
>    Sequential Increment: ok
>    Solid Bits          : ok
>    Block Sequential    : ok
>    Checkerboard        : ok
>    Bit Spread          : ok
>    Bit Flip            : ok
>    Walking Ones        : ok
>    Walking Zeroes      : ok
> 
> 
> Memtester can run for hours without finding an issue, and sometimes it runs
> for several minutes and reports a memory error.
> 
> Found another tool, stresstestapp (http://stressapptest.googlecode.com
> /svn/trunk/) which again seems to trigger the issue. Here's again an example
> of memory error:
> 
> 
> # ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
> Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64
> -s 300
> Stats: SAT revision 1.0.7_autoconf, 32 bit binary
> Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open
> source release
> Log: 1 nodes, 2 cpus.
> Log: Defaulting to 2 copy threads
> Log: Flooring memory allocation to multiple of 4: 64MB
> Log: Prefer plain malloc memory allocation.
> Log: Using mmap() allocation at 0x72430000.
> Stats: Starting SAT, 64M, 300 seconds
> Log: region number 1 exceeds region count 1
> Log: Region mask: 0x1
> Log: Seconds remaining: 240
> Log: Seconds remaining: 180
> Report Error: miscompare : DIMM Unknown : 1 : 134s
> Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM
> Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a
> expected:0xaaaaaaaaaaaaaaaa
> Report Error: miscompare : DIMM Unknown : 1 : 136s
> Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM
> Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe
> expected:0xffffffbfffffffbf
> Log: Seconds remaining: 120
> Log: Seconds remaining: 60
> Report Error: miscompare : DIMM Unknown : 1 : 266s
> Hardware Error: miscompare on CPU 0(0x1) at
> 0x74b979d0(0x358ae9d0:DIMM
> Unknown): read:0x0000001000000000, reread:0x0000001000000000
> expected:0x0000001000000010
> Report Error: miscompare : DIMM Unknown : 1 : 274s
> Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM
> Unknown): read:0x0000001000000000, reread:0x0000001000000000
> expected:0x0000001000000010
> Log: Thread 1 found 3 hardware incidents
> Log: Thread 2 found 1 hardware incidents
> Stats: Found 4 hardware incidents
> Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware
> incidents, 0 errors
> Stats: Memory Copy: 256346.00M at 854.46MB/s
> Stats: File Copy: 0.00M at 0.00MB/s
> Stats: Net Copy: 0.00M at 0.00MB/s
> Stats: Data Check: 0.00M at 0.00MB/s
> Stats: Invert Data: 0.00M at 0.00MB/s
> Stats: Disk: 0.00M at 0.00MB/s
> 
> Status: FAIL - test discovered HW problems
> 
> 
> I plan to run again the FSL DDR stress test to see whether it
> detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
> was also thinking to try with another SO-DIMM module to see whether
> there's any difference.
> 
> Thanks for the ideas so far. This is a major problem for me so I need
> to resolve it before doing anything else on this board.
>

Have you read ERR005198 of the Chip Errata for the i.MX 6Dual/6Quad
http://cache.freescale.com/files/32bit/doc/errata/IMX6DQCE.pdf

-Doug Schwanke

> Kind regards,
> Nikolay
> --
> _______________________________________________
> meta-freescale mailing list
> meta-freescale at yoctoproject.org
> https://lists.yoctoproject.org/listinfo/meta-freescale


More information about the meta-freescale mailing list