I tracked this down to the dispatch calls passing in 0 threads:
RunComputeShader( pContext, m_pTryModeG10CS, pSRVs, 2, pCBCS, pErrBestModeUAV[0], uThreadGroupCount / 4, 1, 1 );
should be:
RunComputeShader(pContext, m_pTryModeG10CS, pSRVs, 2, pCBCS, pErrBestModeUAV[0], __max(uThreadGroupCount / 4, 1), 1, 1);
__max() should be added to subsequent calls to RunComputeShader as well. This already exists for BC7, but not BC6H.
RunComputeShader( pContext, m_pTryModeG10CS, pSRVs, 2, pCBCS, pErrBestModeUAV[0], uThreadGroupCount / 4, 1, 1 );
should be:
RunComputeShader(pContext, m_pTryModeG10CS, pSRVs, 2, pCBCS, pErrBestModeUAV[0], __max(uThreadGroupCount / 4, 1), 1, 1);
__max() should be added to subsequent calls to RunComputeShader as well. This already exists for BC7, but not BC6H.