1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
|
/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* -*- C++ -*-
* Copyright (C) 1998 <developer@mozilla.org>
*
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include "nsSBCharSetProber.h"
#include <stdio.h>
namespace qencodingprober {
nsProbingState nsSingleByteCharSetProber::HandleData(const char* aBuf, unsigned int aLen)
{
unsigned char order;
for (unsigned int i = 0; i < aLen; i++)
{
order = mModel->charToOrderMap[(unsigned char)aBuf[i]];
if (order < SYMBOL_CAT_ORDER)
mTotalChar++;
if (order < SAMPLE_SIZE)
{
mFreqChar++;
if (mLastOrder < SAMPLE_SIZE)
{
mTotalSeqs++;
if (!mReversed)
++(mSeqCounters[(int)mModel->precedenceMatrix[mLastOrder*SAMPLE_SIZE+order]]);
else // reverse the order of the letters in the lookup
++(mSeqCounters[(int)mModel->precedenceMatrix[order*SAMPLE_SIZE+mLastOrder]]);
}
}
mLastOrder = order;
}
if (mState == eDetecting)
if (mTotalSeqs > SB_ENOUGH_REL_THRESHOLD)
{
float cf = GetConfidence();
if (cf > POSITIVE_SHORTCUT_THRESHOLD)
mState = eFoundIt;
else if (cf < NEGATIVE_SHORTCUT_THRESHOLD)
mState = eNotMe;
}
return mState;
}
void nsSingleByteCharSetProber::Reset(void)
{
mState = eDetecting;
mLastOrder = 255;
for (unsigned int i = 0; i < NUMBER_OF_SEQ_CAT; i++)
mSeqCounters[i] = 0;
mTotalSeqs = 0;
mTotalChar = 0;
mFreqChar = 0;
}
//#define NEGATIVE_APPROACH 1
float nsSingleByteCharSetProber::GetConfidence(void)
{
#ifdef NEGATIVE_APPROACH
if (mTotalSeqs > 0)
if (mTotalSeqs > mSeqCounters[NEGATIVE_CAT]*10 )
return ((float)(mTotalSeqs - mSeqCounters[NEGATIVE_CAT]*10))/mTotalSeqs * mFreqChar / mTotalChar;
return (float)0.01;
#else //POSITIVE_APPROACH
float r;
if (mTotalSeqs > 0) {
r = ((float)1.0) * mSeqCounters[POSITIVE_CAT] / mTotalSeqs / mModel->mTypicalPositiveRatio;
r = r*mFreqChar/mTotalChar;
if (r >= (float)1.00)
r = (float)0.99;
return r;
}
return (float)0.01;
#endif
}
const char* nsSingleByteCharSetProber::GetCharSetName()
{
if (!mNameProber)
return mModel->charsetName;
return mNameProber->GetCharSetName();
}
#ifdef DEBUG_PROBE
void nsSingleByteCharSetProber::DumpStatus()
{
printf(" SBCS: %1.3f [%s]\r\n", GetConfidence(), GetCharSetName());
}
#endif
}
|