From ron@debian.org  Tue May  1 01:41:13 2012
Return-Path: <ron@debian.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 753D621F86D4; Tue,  1 May 2012 01:41:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.423
X-Spam-Level: 
X-Spam-Status: No, score=-1.423 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FH_HOST_EQ_D_D_D_D=0.765, HOST_MISMATCH_NET=0.311, RDNS_DYNAMIC=0.1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UbWFe26rwips; Tue,  1 May 2012 01:41:13 -0700 (PDT)
Received: from ipmail06.adl2.internode.on.net (unknown [IPv6:2001:44b8:8060:ff02:300:1:2:6]) by ietfa.amsl.com (Postfix) with ESMTP id 9083421F86D3; Tue,  1 May 2012 01:41:12 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgsFAByhn095LXH5/2dsb2JhbABErwJlgwCBCIIJAQEFOhwjEAsYIwsUGA0kJ4d4ui+QImMEiGKFL4dsAZBCgng
Received: from ppp121-45-113-249.lns20.adl6.internode.on.net (HELO audi.shelbyville.oz) ([121.45.113.249]) by ipmail06.adl2.internode.on.net with ESMTP; 01 May 2012 18:11:10 +0930
Received: from localhost (localhost [127.0.0.1]) by audi.shelbyville.oz (Postfix) with ESMTP id 9A8BC4F8F3; Tue,  1 May 2012 18:11:08 +0930 (CST)
X-Virus-Scanned: Debian amavisd-new at audi.shelbyville.oz
Received: from audi.shelbyville.oz ([127.0.0.1]) by localhost (audi.shelbyville.oz [127.0.0.1]) (amavisd-new, port 10024) with LMTP id EjQfgvkuUomE; Tue,  1 May 2012 18:11:07 +0930 (CST)
Received: by audi.shelbyville.oz (Postfix, from userid 1000) id 7DA934F8FE; Tue,  1 May 2012 18:11:07 +0930 (CST)
Date: Tue, 1 May 2012 18:11:07 +0930
From: Ron <ron@debian.org>
To: SM <sm@resistor.net>
Message-ID: <20120501084107.GB18009@audi.shelbyville.oz>
References: <6.2.5.6.2.20120430120153.0947ed48@resistor.net> <CBC4E0F3.867E4%stewe@stewe.org> <6.2.5.6.2.20120430223624.0c706828@resistor.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <6.2.5.6.2.20120430223624.0c706828@resistor.net>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: codec@ietf.org, ietf@ietf.org
Subject: Re: [codec] Last Call: <draft-ietf-codec-opus-12.txt> (Definition of the Opus Audio Codec) to Proposed Standard
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 08:41:13 -0000

On Mon, Apr 30, 2012 at 11:28:59PM -0700, SM wrote:
> At 18:40 30-04-2012, Ron wrote:
> >If this clause becomes a blocker, then we should simply remove it, but in that
> >case it would be good to have clear reasons why it became a blocker, since the
> >things you say you fear here, I see as already being prohibited anyway.
> 
> The text in Section 10 is ambiguous.
> 
> Given all the efforts that went into RFC 6569, it's odd to see the
> text being discussed during the Last Call instead of the WGLC.

Nobody raised any questions about it to discuss until Robert's AD review.
And since it's essentially the same text as had been included in previously
published RFCs, that didn't seem particularly surprising either.

Whatever resolution we arrive at over this, it doesn't effect the substantive
content of the proposed standard. And the question in question does seem like
something that's out of scope for the CODEC WG to answer in general anyway,
beyond giving the rationale for adding it in this case.

I'm happy to defer to whatever the broader community thinks will work best
on this one.

 Best,
 Ron


From internet-drafts@ietf.org  Tue May  1 08:10:42 2012
Return-Path: <internet-drafts@ietf.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8FCD321F8A78; Tue,  1 May 2012 08:10:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level: 
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kK2s1OK02rk2; Tue,  1 May 2012 08:10:37 -0700 (PDT)
Received: from ietfa.amsl.com (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 62C5C21F8A73; Tue,  1 May 2012 08:10:37 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: internet-drafts@ietf.org
To: i-d-announce@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 4.02
Message-ID: <20120501151037.25504.54085.idtracker@ietfa.amsl.com>
Date: Tue, 01 May 2012 08:10:37 -0700
Cc: codec@ietf.org
Subject: [codec] I-D Action: draft-ietf-codec-results-01.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 15:10:42 -0000

A New Internet-Draft is available from the on-line Internet-Drafts director=
ies. This draft is a work item of the Internet Wideband Audio Codec Working=
 Group of the IETF.

	Title           : Summary of Opus listening test results
	Author(s)       : Christian Hoene
                          Jean-Marc Valin
                          Koen Vos
                          Jan Skoglund
	Filename        : draft-ietf-codec-results-01.txt
	Pages           : 31
	Date            : 2012-05-01

   This document describes and examines listening test results obtained
   for the Opus codec and how they relate to the requirements.


A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-codec-results-01.txt

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

This Internet-Draft can be retrieved at:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-codec-results-01.txt

The IETF datatracker page for this Internet-Draft is:
https://datatracker.ietf.org/doc/draft-ietf-codec-results/


From hoene@uni-tuebingen.de  Tue May  1 08:36:30 2012
Return-Path: <hoene@uni-tuebingen.de>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5484321F8603 for <codec@ietfa.amsl.com>; Tue,  1 May 2012 08:36:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.949
X-Spam-Level: 
X-Spam-Status: No, score=-0.949 tagged_above=-999 required=5 tests=[AWL=1.300,  BAYES_00=-2.599, HELO_EQ_DE=0.35]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V1H4ldTYTniI for <codec@ietfa.amsl.com>; Tue,  1 May 2012 08:36:29 -0700 (PDT)
Received: from mx08.uni-tuebingen.de (mx08.uni-tuebingen.de [134.2.3.6]) by ietfa.amsl.com (Postfix) with ESMTP id 32A8821E8049 for <codec@ietf.org>; Tue,  1 May 2012 08:36:29 -0700 (PDT)
Received: from hoeneT60 (u-173-c040.cs.uni-tuebingen.de [134.2.173.40]) (authenticated bits=0) by mx08.uni-tuebingen.de (8.13.6/8.13.6) with ESMTP id q41FaRcq023232 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for <codec@ietf.org>; Tue, 1 May 2012 17:36:27 +0200
From: "Christian Hoene" <hoene@uni-tuebingen.de>
To: <codec@ietf.org>
References: <20120501151037.25504.54085.idtracker@ietfa.amsl.com>
In-Reply-To: <20120501151037.25504.54085.idtracker@ietfa.amsl.com>
Date: Tue, 1 May 2012 17:36:28 +0200
Organization: =?iso-8859-1?Q?Universit=E4t_T=FCbingen?=
Message-ID: <000401cd27b0$2d665160$8832f420$@uni-tuebingen.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQK2mkBevC1y92MHwVZACSlXIGHt4ZTiKfrQ
Content-Language: de
X-AntiVirus: NOT checked by Avira MailGate (version: 3.2.1.23; host: mx08)
Subject: Re: [codec] I-D Action: draft-ietf-codec-results-01.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 15:36:30 -0000

Hello,

this time I did not added anything new to the draft.

Results that are on my todo list are those of
* Kamedo
* the test traces=20
* the tests on software reliability
Any written text as input is highly appreciated.

After the codec draft has become a RFC, further characterization tests =
are
on the schedule of our test lab.

With best regards,

 Christian=20

--
Dr.-Ing. Christian Hoene, University of T=FCbingen,
Faculty of Science, Department of Computer Science,=20
Communication Networks, Sand 13, 72076 T=FCbingen, Germany,
Phone +49 7071 2970532, http://kn.inf.uni-tuebingen.de


> -----Original Message-----
> From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf
> Of internet-drafts@ietf.org
> Sent: Tuesday, May 01, 2012 5:11 PM
> To: i-d-announce@ietf.org
> Cc: codec@ietf.org
> Subject: [codec] I-D Action: draft-ietf-codec-results-01.txt
>=20
>=20
> A New Internet-Draft is available from the on-line Internet-Drafts
directories.
> This draft is a work item of the Internet Wideband Audio Codec Working
> Group of the IETF.
>=20
> 	Title           : Summary of Opus listening test results
> 	Author(s)       : Christian Hoene
>                           Jean-Marc Valin
>                           Koen Vos
>                           Jan Skoglund
> 	Filename        : draft-ietf-codec-results-01.txt
> 	Pages           : 31
> 	Date            : 2012-05-01
>=20
>    This document describes and examines listening test results =
obtained
>    for the Opus codec and how they relate to the requirements.
>=20
>=20
> A URL for this Internet-Draft is:
> http://www.ietf.org/internet-drafts/draft-ietf-codec-results-01.txt
>=20
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>=20
> This Internet-Draft can be retrieved at:
> ftp://ftp.ietf.org/internet-drafts/draft-ietf-codec-results-01.txt
>=20
> The IETF datatracker page for this Internet-Draft is:
> https://datatracker.ietf.org/doc/draft-ietf-codec-results/
>=20
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec


From hartmans@mit.edu  Tue May  1 09:11:29 2012
Return-Path: <hartmans@mit.edu>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A88521E82B0; Tue,  1 May 2012 09:11:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.938
X-Spam-Level: 
X-Spam-Status: No, score=-102.938 tagged_above=-999 required=5 tests=[AWL=-0.673, BAYES_00=-2.599, IP_NOT_FRIENDLY=0.334, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id StO+7LWroRep; Tue,  1 May 2012 09:11:28 -0700 (PDT)
Received: from permutation-city.suchdamage.org (permutation-city.suchdamage.org [69.25.196.28]) by ietfa.amsl.com (Postfix) with ESMTP id 6D90B21E8271; Tue,  1 May 2012 09:11:12 -0700 (PDT)
Received: from carter-zimmerman.suchdamage.org (carter-zimmerman.suchdamage.org [69.25.196.178]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "laptop", Issuer "laptop" (not verified)) by mail.suchdamage.org (Postfix) with ESMTPS id C7B81201CB; Tue,  1 May 2012 12:07:10 -0400 (EDT)
Received: by carter-zimmerman.suchdamage.org (Postfix, from userid 8042) id D4C5A4769; Tue,  1 May 2012 12:10:58 -0400 (EDT)
From: Sam Hartman <hartmans-ietf@mit.edu>
To: Ron <ron@debian.org>
References: <6.2.5.6.2.20120430120153.0947ed48@resistor.net> <CBC4E0F3.867E4%stewe@stewe.org> <20120501014037.GA18009@audi.shelbyville.oz>
Date: Tue, 01 May 2012 12:10:58 -0400
In-Reply-To: <20120501014037.GA18009@audi.shelbyville.oz> (ron@debian.org's message of "Tue, 1 May 2012 11:10:37 +0930")
Message-ID: <tsllilbvq3x.fsf@mit.edu>
User-Agent: Gnus/5.110009 (No Gnus v0.9) Emacs/22.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: SM <sm@resistor.net>, "codec@ietf.org" <codec@ietf.org>, "codec-chairs@tools.ietf.org" <codec-chairs@tools.ietf.org>, "ietf@ietf.org" <ietf@ietf.org>
Subject: Re: [codec] Last Call: <draft-ietf-codec-opus-12.txt> (Definition of the Opus Audio Codec) to Proposed Standard
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 16:11:29 -0000

For what it's worth, I support authors' (and this is actually one of the
few times I mean that rather than editors) right to make such a grant. I
believe the community is significantly better served by having
additional grants in the RFC, and strongly support us permitting them. I
believe our current policy allows them; if the community consensus is
that is not the case I support changes to the policy up to an
appropriate BCP to permit this.

From jbakercto@gmail.com  Tue May  1 12:33:40 2012
Return-Path: <jbakercto@gmail.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E610421E80CE for <codec@ietfa.amsl.com>; Tue,  1 May 2012 12:33:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level: 
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 96axKBtrwYOH for <codec@ietfa.amsl.com>; Tue,  1 May 2012 12:33:40 -0700 (PDT)
Received: from mail-we0-f172.google.com (mail-we0-f172.google.com [74.125.82.172]) by ietfa.amsl.com (Postfix) with ESMTP id DEFB321E808D for <codec@ietf.org>; Tue,  1 May 2012 12:33:39 -0700 (PDT)
Received: by werb10 with SMTP id b10so3187661wer.31 for <codec@ietf.org>; Tue, 01 May 2012 12:33:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=/YfbXRqX6oV/CIVG2zpun80hB7yxVCJWey/9/2tOUx8=; b=CKqKgQ9n526t32WAJaqgbt3ZcTXZsONJJyn8QmzEP4xfO24YPk6x75OqewB4T7pzi7 WsbGCH5NwnM0CjrgV2ppiJP357QkSu9Ma5RBvz2giGhNhyXLAjiKvPTAXyHlrxZDFWzr 0wrqKUlu5dkRuuuwBdW2nfFYXZTvsvIajm0tWmGPyG8VA2rmYl6NOUJueBcQcYrIbSV6 Dx3//CE07c0WNoCbjO3F6DCdEy0zFKKHDzONeQoU2D4LEBtz9xLuo1/Y54nj7OYylKq+ vqSLz2SWDWe8BchA+FeiZ0O3XUzbVAiVNUgt9onbgSYrVEnKWdCWwq+XZqPBVjNzjLqQ w/oQ==
MIME-Version: 1.0
Received: by 10.180.81.37 with SMTP id w5mr4042585wix.16.1335900818352; Tue, 01 May 2012 12:33:38 -0700 (PDT)
Received: by 10.194.23.132 with HTTP; Tue, 1 May 2012 12:33:38 -0700 (PDT)
Date: Tue, 1 May 2012 19:33:38 +0000
Message-ID: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com>
From: Joe Baker <jbakercto@gmail.com>
To: codec@ietf.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 20:15:52 -0000

The 'possible royalty/fee' in some the of IPR disclosures is unacceptable
for my applications.  After a rather time consuming review my team has
advised that I do not need to be concerned about the currently disclosed
non-RF patents but we were unsure about undisclosed applications.

Fortunately the majority of the codec components have been disclosed
to the public for many years in earlier drafts and open source releases
by the draft authors.  I assume that the closed source usage of these
components goes back even further.

Can the authors of the draft please indicate when the major components
of the codec were first offered to the public, in any form, so we can
better gauge the risk of substantial components being covered by
undisclosed applications?

If there are any royalty bearing components in the codec they must be
removed if the the codec is to be usable for me.  Unfortunately, the
Huawei disclosures are so vague that they can hardly be considered
disclosures because they claim unpublished applications but do not
specify the subject matter.  These were only reported long after the
last call on the draft which might complicate removing anything in
response to them.  With such late and non-specific disclosures how
are we to express our preference to avoid non-RF encumbered
technologies?

I have audited the entire WG discussion archives, meeting minutes and
source code repositories and was not able to find any technical
contributions by anyone from Huawei.  I assume that the IETF policy
does not permit working group participants to follow the working group
and secretly patent the working group's inventions as their own?

It would be helpful if anyone here is aware that their specific technical
contributions will be covered by currently undisclosed applications which
may not have royalty free licensing if they would point them out to the
group so that we can decide if we should remove those parts.

I did notice one potentially important discrepancy.  There are many
more authors of the draft which are not are listed on it as authors. Are
these people treated equally to the listed authors in terms of the IETF
IPR disclosure requirements?  I believe some of them have never posted to
this list although I may have failed to match some of the names.

+- Descriptive text ---------------------+
|         Author           Lines  Pct Msg|
+----------------------------------------+
|Timothy B. Terriberry      4803  60%  Y |
|Jean-Marc Valin            2428  30%  Y |
|Koen Vos                    473   6%  Y |
|Gregory Maxwell             141   2%  Y |
|Kat Walsh                   112   1%  Y |
|Benjamin M. Schwartz         48  <1%  Y |
|Ralph Giles                  10  <1%  Y |
|Total--------------------- 8015 100%  - |
+----------------------------------------+

+- Formal specification -----------------+
|         Author           Lines  Pct Msg|
+----------------------------------------+
|Jean-Marc Valin           21152  42%  Y |
|Gregory Maxwell           20497  41%  Y |
|Koen Vos                   4111   8%  Y |
|Timothy B. Terriberry      3057   6%  Y |
|Ralph Giles                1329   3%  Y |
|John Ridges                  24  <1%  Y |
|Benjamin M. Schwartz         20  <1%  Y |
|Karsten Vandborg Sorensen    12  <1%  Y |
|Wessel Lubberhuizen           7  <1%  N |
|Thorvald Natvig               7  <1%  Y |
|Alfred E. Heggestad           2  <1%  Y |
|Christian Hoene               2  <1%  Y |
|Benjamin Jemlich              2  <1%  N |
|David Schleef                 1  <1%  N |
|Kat Walsh                     1  <1%  Y |
|Total-------------------- 50224 100%  - |
+----------------------------------------+

The file AUTHORS in the repository also lists
Soren Skak Jensen and the file COPYING lists
Mark Borgerding and Erik de Castro Lopo but
I am not able to find any revisions or list
posts by them.

Thank you.

From jmvalin@mozilla.com  Tue May  1 17:16:43 2012
Return-Path: <jmvalin@mozilla.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F16521E8086 for <codec@ietfa.amsl.com>; Tue,  1 May 2012 17:16:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level: 
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L-uuw+utuUFE for <codec@ietfa.amsl.com>; Tue,  1 May 2012 17:16:39 -0700 (PDT)
Received: from dm-mail03.mozilla.org (dm-mail03.mozilla.org [63.245.208.213]) by ietfa.amsl.com (Postfix) with ESMTP id 5061021E8020 for <codec@ietf.org>; Tue,  1 May 2012 17:16:38 -0700 (PDT)
Received: from [192.168.1.15] (modemcable014.207-160-184.mc.videotron.ca [184.160.207.14]) (Authenticated sender: jvalin@mozilla.com) by dm-mail03.mozilla.org (Postfix) with ESMTP id 3A6344AEDE1; Tue,  1 May 2012 17:16:38 -0700 (PDT)
Message-ID: <4FA07CE4.7090206@mozilla.com>
Date: Tue, 01 May 2012 20:16:36 -0400
From: Jean-Marc Valin <jmvalin@mozilla.com>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: Joe Baker <jbakercto@gmail.com>
References: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com>
In-Reply-To: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: codec@ietf.org
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 May 2012 00:16:43 -0000

On 05/01/2012 03:33 PM, Joe Baker wrote:
> Can the authors of the draft please indicate when the major components
> of the codec were first offered to the public, in any form, so we can
> better gauge the risk of substantial components being covered by
> undisclosed applications?

Back when we had the first codec BoF (Jul 2009, Stockholm), both SILK
and CELT were already being deployed, and though they were further
modified after that point, the "core technology" hasn't changed that
much. As far as I can tell, SILK was "first offered to the public",
early 2009
(http://www.skypejournal.com/2009/02/silk-skype-new-audio-codec-sets-new.html),
while CELT was first released late 2007. Since January 2011, the CELT
layer has been completely frozen and the SILK layer only got minor fixes.

> I have audited the entire WG discussion archives, meeting minutes and
> source code repositories and was not able to find any technical
> contributions by anyone from Huawei.  I assume that the IETF policy
> does not permit working group participants to follow the working group
> and secretly patent the working group's inventions as their own?
> 
> It would be helpful if anyone here is aware that their specific technical
> contributions will be covered by currently undisclosed applications which
> may not have royalty free licensing if they would point them out to the
> group so that we can decide if we should remove those parts.

As far as I know, the contributions of Huawei have been limited to
providing feedback, comments, review, and requirements rather than code,
algorithms, or design.

> I did notice one potentially important discrepancy.  There are many
> more authors of the draft which are not are listed on it as authors. Are
> these people treated equally to the listed authors in terms of the IETF
> IPR disclosure requirements?  I believe some of them have never posted to
> this list although I may have failed to match some of the names.

As far as I know, the IETF only allows about 5 authors to be listed at
the top of the draft, even though many more people may have contributed
to the document. You are correct that some people listed in the git logs
have never posted. There are two reasons for this. Some people have
merely sent us small bug fixes, draft typos, comment fixes and the like.
In the case of Mark Borgerding and Erik de Castro, they are authors of
pre-existing BSD code that we decided to include in Opus. This code
implements an FFT and fast float-integer conversion functions, so
nothing I'd be too worried about when it comes to IPR. As for Soren Skak
Jensen, he was a Skype employee, so his contributions are already covered.

On top of the names you found, there's actually one more significant
contribution. This was made by Raymond Chen on the mailing list. Because
I completely re-implemented that contribution in C, his name does not
show up in the git logs. However, Broadcom has properly disclosed IPR
and licensed it under the same license as the one Xiph.Org used.

Hope that answers your questions.

	Jean-Marc

From gmaxwell@juniper.net  Tue May  1 17:24:39 2012
Return-Path: <gmaxwell@juniper.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4965621F89B1 for <codec@ietfa.amsl.com>; Tue,  1 May 2012 17:24:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level: 
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nLMYksFA8W25 for <codec@ietfa.amsl.com>; Tue,  1 May 2012 17:24:38 -0700 (PDT)
Received: from exprod7og127.obsmtp.com (exprod7og127.obsmtp.com [64.18.2.210]) by ietfa.amsl.com (Postfix) with ESMTP id A012E21F89AE for <codec@ietf.org>; Tue,  1 May 2012 17:24:38 -0700 (PDT)
Received: from P-EMHUB02-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob127.postini.com ([64.18.6.12]) with SMTP ID DSNKT6B+xD99ltSMLSqH2XyUGmHa9Xp77fDz@postini.com; Tue, 01 May 2012 17:24:38 PDT
Received: from EMBX01-HQ.jnpr.net ([fe80::c821:7c81:f21f:8bc7]) by P-EMHUB02-HQ.jnpr.net ([fe80::88f9:77fd:dfc:4d51%11]) with mapi; Tue, 1 May 2012 17:23:51 -0700
From: Gregory Maxwell <gmaxwell@juniper.net>
To: Joe Baker <jbakercto@gmail.com>, "codec@ietf.org" <codec@ietf.org>
Date: Tue, 1 May 2012 17:22:53 -0700
Thread-Topic: [codec] Authors and disclosures,	regarding Last Call: <draft-ietf-codec-opus-12.txt>
Thread-Index: Ac0n1zZQgNyePzBHSTyi0LmWyueaUQAIoDuS
Message-ID: <BCB3F026FAC4C145A4A3330806FEFDA94086731B99@EMBX01-HQ.jnpr.net>
References: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com>
In-Reply-To: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 May 2012 00:24:39 -0000

For the sake of historic accuracy,=20

> |Jean-Marc Valin           21152  42%  Y |
> |Gregory Maxwell           20497  41%  Y |
> |Koen Vos                   4111   8%  Y |
> |Timothy B. Terriberry      3057   6%  Y |

Even with a SCM it can be tricky to sort out who did what.

I didn't write half the codec, in particular git misattributes about 15,000
lines of the silk code to me because I corrected formatting issues. Much of
that should be assigned to the silk team. My contribution to the silk layer
mostly was finding bugs, removing code, dependencies on undefined behavior,
etc.

Similarly, a few hundred lines of Tim's code is misattributed to me because
he and the compiler have a disagreement about the importance of parentheses
around weakly binding operators like <<. At some point I reviewed and fixed
all the warnings. The numbers also look like they might include the testing
code which I wrote most of but isn't a part of the draft.

These sorts of line counting metrics don't rightfully attribute many of the
contributions which didn't directly change the implementation or the text--
like the fact that Tim wrote a nearly complete second implementation of the
codec, significant software testing by myself, psycho-acoustic testing plus
tuning by Monty, Igor C., Kat, and others, or Ben Schwartz's consistent and
thoughtful shed-painting of the entire codec design (he convinced Jean-Marc
that the mode switching could be made glitchless without boiling the oceans
by figuring out how to do it with minimal modifications and writing running
code for it, though that wasn't the specific code that made it in).


From kpfleming@digium.com  Wed May  2 05:44:59 2012
Return-Path: <kpfleming@digium.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 853F021F87C4 for <codec@ietfa.amsl.com>; Wed,  2 May 2012 05:44:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.341
X-Spam-Level: 
X-Spam-Status: No, score=-106.341 tagged_above=-999 required=5 tests=[AWL=0.258, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3FWPNNmPoAbf for <codec@ietfa.amsl.com>; Wed,  2 May 2012 05:44:58 -0700 (PDT)
Received: from mail.digium.com (mail.digium.com [216.207.245.2]) by ietfa.amsl.com (Postfix) with ESMTP id EC33E21F87BF for <codec@ietf.org>; Wed,  2 May 2012 05:44:57 -0700 (PDT)
Received: from [10.24.55.203] (helo=zimbra.hsv.digium.com) by mail.digium.com with esmtp (Exim 4.69) (envelope-from <kpfleming@digium.com>) id 1SPYvQ-00017x-JV for codec@ietf.org; Wed, 02 May 2012 07:44:56 -0500
Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra.hsv.digium.com (Postfix) with ESMTP id 98A7CD8004 for <codec@ietf.org>; Wed,  2 May 2012 07:44:56 -0500 (CDT)
Received: from zimbra.hsv.digium.com ([127.0.0.1]) by localhost (zimbra.hsv.digium.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7LbJSqZZzuw9 for <codec@ietf.org>; Wed,  2 May 2012 07:44:51 -0500 (CDT)
Received: from [192.168.1.5] (173-18-150-64.client.mchsi.com [173.18.150.64]) by zimbra.hsv.digium.com (Postfix) with ESMTPSA id 8ED1DD8002 for <codec@ietf.org>; Wed,  2 May 2012 07:44:51 -0500 (CDT)
Message-ID: <4FA12C35.9000703@digium.com>
Date: Wed, 02 May 2012 07:44:37 -0500
From: "Kevin P. Fleming" <kpfleming@digium.com>
Organization: Digium, Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120410 Thunderbird/11.0.1
MIME-Version: 1.0
To: codec@ietf.org
References: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com> <4FA07CE4.7090206@mozilla.com>
In-Reply-To: <4FA07CE4.7090206@mozilla.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 May 2012 12:44:59 -0000

On 05/01/2012 07:16 PM, Jean-Marc Valin wrote:
> On 05/01/2012 03:33 PM, Joe Baker wrote:
>> >  Can the authors of the draft please indicate when the major components
>> >  of the codec were first offered to the public, in any form, so we can
>> >  better gauge the risk of substantial components being covered by
>> >  undisclosed applications?
> Back when we had the first codec BoF (Jul 2009, Stockholm), both SILK
> and CELT were already being deployed, and though they were further
> modified after that point, the "core technology" hasn't changed that
> much. As far as I can tell, SILK was "first offered to the public",
> early 2009
> (http://www.skypejournal.com/2009/02/silk-skype-new-audio-codec-sets-new.html),
> while CELT was first released late 2007. Since January 2011, the CELT
> layer has been completely frozen and the SILK layer only got minor fixes.

While it's true that SILK was available at that time, it was only 
available in binary form embedded inside products delivered by Skype, 
and those products heavily obscure and encrypt their behavior. It seems 
unlikely that anyone who had IPR in any of its methods would have been 
able to determine that SILK was potentially infringing before its source 
code was made available. I would consider the 'first public disclosure' 
date to be the first time the source code was published.

-- 
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
Jabber: kfleming@digium.com | SIP: kpfleming@digium.com | Skype: kpfleming
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at www.digium.com & www.asterisk.org

From jmvalin@jmvalin.ca  Wed May  2 06:40:01 2012
Return-Path: <jmvalin@jmvalin.ca>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0A0F921F85DD for <codec@ietfa.amsl.com>; Wed,  2 May 2012 06:40:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level: 
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u6yl4F6N5QMl for <codec@ietfa.amsl.com>; Wed,  2 May 2012 06:40:00 -0700 (PDT)
Received: from relais.videotron.ca (relais.videotron.ca [24.201.245.36]) by ietfa.amsl.com (Postfix) with ESMTP id 7F01C21F85D3 for <codec@ietf.org>; Wed,  2 May 2012 06:40:00 -0700 (PDT)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; CHARSET=US-ASCII
Received: from [192.168.1.14] ([70.83.239.48]) by VL-VM-MR005.ip.videotron.ca (Oracle Communications Messaging Exchange Server 7u4-22.01 64bit (built Apr 21 2011)) with ESMTP id <0M3E00JUTDYN3V20@VL-VM-MR005.ip.videotron.ca> for codec@ietf.org; Wed, 02 May 2012 09:39:59 -0400 (EDT)
Message-id: <4FA1392E.8040702@jmvalin.ca>
Date: Wed, 02 May 2012 09:39:58 -0400
From: Jean-Marc Valin <jmvalin@jmvalin.ca>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
To: "Kevin P. Fleming" <kpfleming@digium.com>
References: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com> <4FA07CE4.7090206@mozilla.com> <4FA12C35.9000703@digium.com>
In-reply-to: <4FA12C35.9000703@digium.com>
X-Enigmail-Version: 1.4.1
Cc: codec@ietf.org
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 May 2012 13:40:01 -0000

On 12-05-02 08:44 AM, Kevin P. Fleming wrote:
> While it's true that SILK was available at that time, it was only
> available in binary form embedded inside products delivered by Skype,
> and those products heavily obscure and encrypt their behavior. It seems
> unlikely that anyone who had IPR in any of its methods would have been
> able to determine that SILK was potentially infringing before its source
> code was made available. I would consider the 'first public disclosure'
> date to be the first time the source code was published.

As far as I can tell, the actual source code was released as open source
in March 2010, though it may have been available before that to
developers willing to sign some agreement. Also, fortunately prior art
gets established based on the time Silk was available even in obfuscated
form.

	Jean-Marc

From kpfleming@digium.com  Wed May  2 07:39:52 2012
Return-Path: <kpfleming@digium.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C57B521F864E for <codec@ietfa.amsl.com>; Wed,  2 May 2012 07:39:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.824
X-Spam-Level: 
X-Spam-Status: No, score=-105.824 tagged_above=-999 required=5 tests=[AWL=-0.517, BAYES_00=-2.599, MISSING_HEADERS=1.292, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H4dldIzL-mW2 for <codec@ietfa.amsl.com>; Wed,  2 May 2012 07:39:52 -0700 (PDT)
Received: from mail.digium.com (mail.digium.com [216.207.245.2]) by ietfa.amsl.com (Postfix) with ESMTP id 2AE6621F864D for <codec@ietf.org>; Wed,  2 May 2012 07:39:52 -0700 (PDT)
Received: from [10.24.55.203] (helo=zimbra.hsv.digium.com) by mail.digium.com with esmtp (Exim 4.69) (envelope-from <kpfleming@digium.com>) id 1SPaid-0002v4-Qq for codec@ietf.org; Wed, 02 May 2012 09:39:51 -0500
Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra.hsv.digium.com (Postfix) with ESMTP id CC89FD8004 for <codec@ietf.org>; Wed,  2 May 2012 09:39:51 -0500 (CDT)
Received: from zimbra.hsv.digium.com ([127.0.0.1]) by localhost (zimbra.hsv.digium.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ToKCp6RnKSjA for <codec@ietf.org>; Wed,  2 May 2012 09:39:51 -0500 (CDT)
Received: from [10.24.250.46] (unknown [10.24.250.46]) by zimbra.hsv.digium.com (Postfix) with ESMTPSA id 7B7ACD8002 for <codec@ietf.org>; Wed,  2 May 2012 09:39:51 -0500 (CDT)
Message-ID: <4FA14729.20708@digium.com>
Date: Wed, 02 May 2012 09:39:37 -0500
From: "Kevin P. Fleming" <kpfleming@digium.com>
Organization: Digium, Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120410 Thunderbird/11.0.1
MIME-Version: 1.0
CC: codec@ietf.org
References: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com> <4FA07CE4.7090206@mozilla.com> <4FA12C35.9000703@digium.com> <4FA1392E.8040702@jmvalin.ca>
In-Reply-To: <4FA1392E.8040702@jmvalin.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 May 2012 14:39:52 -0000

On 05/02/2012 08:39 AM, Jean-Marc Valin wrote:
>
>
> On 12-05-02 08:44 AM, Kevin P. Fleming wrote:
>> While it's true that SILK was available at that time, it was only
>> available in binary form embedded inside products delivered by Skype,
>> and those products heavily obscure and encrypt their behavior. It seems
>> unlikely that anyone who had IPR in any of its methods would have been
>> able to determine that SILK was potentially infringing before its source
>> code was made available. I would consider the 'first public disclosure'
>> date to be the first time the source code was published.
>
> As far as I can tell, the actual source code was released as open source
> in March 2010, though it may have been available before that to
> developers willing to sign some agreement. Also, fortunately prior art
> gets established based on the time Silk was available even in obfuscated
> form.

In the legal sense that is absolutely true, sure. The OP's comment 
though was that he wasn't sure whether potential IPR holders had had 
enough time to determine whether SILK/OPUS may be infringing on their 
IPR. In that case, the extra year of SILK availability in binary-only 
obfuscated form likely won't make much difference.

Also, we were one of the licensees of SILK prior to the source code 
publication (although we never released any products using it) and the 
codec was only offered to us in binary form.

-- 
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
Jabber: kfleming@digium.com | SIP: kpfleming@digium.com | Skype: kpfleming
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at www.digium.com & www.asterisk.org

From sm@resistor.net  Fri Apr 27 02:46:18 2012
Return-Path: <sm@resistor.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 988F121F883B for <codec@ietfa.amsl.com>; Fri, 27 Apr 2012 02:46:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.553
X-Spam-Level: 
X-Spam-Status: No, score=-102.553 tagged_above=-999 required=5 tests=[AWL=0.046, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nMconowclUgG for <codec@ietfa.amsl.com>; Fri, 27 Apr 2012 02:46:18 -0700 (PDT)
Received: from mx.ipv6.elandsys.com (mx.ipv6.elandsys.com [IPv6:2001:470:f329:1::1]) by ietfa.amsl.com (Postfix) with ESMTP id 046E921F8600 for <codec@ietf.org>; Fri, 27 Apr 2012 02:46:17 -0700 (PDT)
Received: from SUBMAN.resistor.net (IDENT:sm@localhost [127.0.0.1]) (authenticated bits=0) by mx.elandsys.com (8.14.5/8.14.5) with ESMTP id q3R9kB0Z010976 for <codec@ietf.org>; Fri, 27 Apr 2012 02:46:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=opendkim.org; s=mail2010; t=1335519976; i=@resistor.net; bh=FoDJQkLiL93oFvgy9FWNJN33YaIVkn8A0iOCcWKqs5M=; h=Date:To:From:Subject:In-Reply-To:References:Cc; b=ReH8lPZhhzt1aPROkveD0XN+EfwIG5IGfNVmd88/59lheXUgNhhp+9iJWMqjVLk0L dHe3b0nBiKCIVzXYTVva7Eb7n6r8Ua01TCJbxkNCCxPCcKaBTue33nkHoYN26eTnAL OYaJfrSEE+edxRnawVxq0/KzbNAif97mU7HDdXcA=
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=resistor.net; s=mail; t=1335519976; i=@resistor.net; bh=FoDJQkLiL93oFvgy9FWNJN33YaIVkn8A0iOCcWKqs5M=; h=Date:To:From:Subject:In-Reply-To:References:Cc; b=JtM4PjS+DlsdYtDnmWnCxkgvEAsSGlbRFxEJJtXahv/rpqfwhr2ch+jCeDlIN3eLG zimM4LGpDUFldT/R2duLOamYxBVhnKqsoalx/pXETeydbpe/jI4fTIo0/e1UIzITqA sOpVZV7iTkK1CFK5fE1XasDngpwXxPDtrx3KRtVA=
Message-Id: <6.2.5.6.2.20120427022121.0aa1fec8@resistor.net>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.5.6
Date: Fri, 27 Apr 2012 02:43:32 -0700
To: codec@ietf.org
From: SM <sm@resistor.net>
In-Reply-To: <20120426202056.15659.50524.idtracker@ietfa.amsl.com>
References: <20120426202056.15659.50524.idtracker@ietfa.amsl.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Mailman-Approved-At: Thu, 03 May 2012 08:46:35 -0700
Subject: Re: [codec] Last Call: <draft-ietf-codec-opus-12.txt> (Definition of the Opus Audio Codec) to Proposed Standard
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Apr 2012 09:46:18 -0000

At 13:20 26-04-2012, The IESG wrote:
>The IESG has received a request from the Internet Wideband Audio Codec WG
>(codec) to consider the following document:
>- 'Definition of the Opus Audio Codec'
>   <draft-ietf-codec-opus-12.txt> as a Proposed Standard

Section 10 about "copying conditions" mentions "without 
royalty".  There was a message to the CODEC mailing list about the 
Qualcomm IPR disclosure.  The Licensing declaration in the Huawei IPR 
disclosure do not mention royalty-free.  Was that taken into account 
by the authors for the statement in Section 10?

Could the authors please clarify what they mean by "the work" in the following:

   "The authors agree to grant third parties the irrevocable right to
    copy, use and distribute the work (excluding Code Components
    available under the simplified BSD license)"

Regards,
-sm 


From sm@resistor.net  Tue May  1 00:13:03 2012
Return-Path: <sm@resistor.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 635A421F864A; Tue,  1 May 2012 00:13:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.558
X-Spam-Level: 
X-Spam-Status: No, score=-102.558 tagged_above=-999 required=5 tests=[AWL=0.041, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GmHL2nu+M0jl; Tue,  1 May 2012 00:13:01 -0700 (PDT)
Received: from mx.ipv6.elandsys.com (mx.ipv6.elandsys.com [IPv6:2001:470:f329:1::1]) by ietfa.amsl.com (Postfix) with ESMTP id 1E4C221F8657; Tue,  1 May 2012 00:13:01 -0700 (PDT)
Received: from SUBMAN.resistor.net (IDENT:sm@localhost [127.0.0.1]) (authenticated bits=0) by mx.elandsys.com (8.14.5/8.14.5) with ESMTP id q417CgZO001861; Tue, 1 May 2012 00:12:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=opendkim.org; s=mail2010; t=1335856372; i=@resistor.net; bh=s9CU/SD9U/YQBdSyTK95nyGgOOCt6uC17RO9blwnWJo=; h=Date:To:From:Subject:Cc:In-Reply-To:References; b=Bl6m9xNlkYSL3o5MJrkE7/3OsNMdqC9ns08U44tonC5bZJZCm+bKKE5uhmAt6+9Ci NgbSK8DOL+rt+BUR05MC6/J09ksAuGiHoAmiV44P0CqpOPJJ4546UtsOkapPHLCXIB nh1tnCgQKdB9lO56dtND/eyj8N3stJ/Rt87uK/s4=
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=resistor.net; s=mail; t=1335856372; i=@resistor.net; bh=s9CU/SD9U/YQBdSyTK95nyGgOOCt6uC17RO9blwnWJo=; h=Date:To:From:Subject:Cc:In-Reply-To:References; b=riH8DR4+BDzwvcuSsamLZZnzQ/AWq1vpZXit5qqaACGMG5eqPKkulxcGhrktLQ3Sh U4DWLSaJCyyGke8L9Dl8WwyxvGzH1aZZoxZA4eYxMSq5G7szVMXG7z4Gb/b79aaxHg gk8Gbag13d5yEl8jjdflMB47FYjVaXddXISzdxqU=
Message-Id: <6.2.5.6.2.20120430223624.0c706828@resistor.net>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.5.6
Date: Mon, 30 Apr 2012 23:28:59 -0700
To: Stephan Wenger <stewe@stewe.org>, Ron <ron@debian.org>, ietf@ietf.org
From: SM <sm@resistor.net>
In-Reply-To: <CBC4E0F3.867E4%stewe@stewe.org>
References: <6.2.5.6.2.20120430120153.0947ed48@resistor.net> <CBC4E0F3.867E4%stewe@stewe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Mailman-Approved-At: Thu, 03 May 2012 08:46:35 -0700
Cc: codec@ietf.org
Subject: Re: [codec] Last Call: <draft-ietf-codec-opus-12.txt> (Definition of the Opus Audio Codec) to Proposed Standard
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 07:13:03 -0000

At 15:53 30-04-2012, Stephan Wenger wrote:
>This subject was also raised by our AD on the codec mailing list.  The
>statement is about spec text copyright (with the possible exception of the
>word "use", which is loaded in this context, see BSD license and implicit
>patent grant ambiguity).  Insofar, the patent licensing statement received
>appear to be irrelevant to this discussion.

Ok.

At 18:40 30-04-2012, Ron wrote:
>If this clause becomes a blocker, then we should simply remove it, but in that
>case it would be good to have clear reasons why it became a blocker, since the
>things you say you fear here, I see as already being prohibited anyway.

The text in Section 10 is ambiguous.

Given all the efforts that went into RFC 6569, it's odd to see the 
text being discussed during the Last Call instead of the WGLC.

Regards,
-sm  


From koen.vos@skype.net  Thu May  3 15:57:27 2012
Return-Path: <koen.vos@skype.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 15DBA21F8700 for <codec@ietfa.amsl.com>; Thu,  3 May 2012 15:57:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level: 
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5YKtQWpr4lbp for <codec@ietfa.amsl.com>; Thu,  3 May 2012 15:57:26 -0700 (PDT)
Received: from db3outboundpool.messaging.microsoft.com (db3ehsobe002.messaging.microsoft.com [213.199.154.140]) by ietfa.amsl.com (Postfix) with ESMTP id 190D721F8709 for <codec@ietf.org>; Thu,  3 May 2012 15:57:25 -0700 (PDT)
Received: from mail90-db3-R.bigfish.com (10.3.81.254) by DB3EHSOBE002.bigfish.com (10.3.84.22) with Microsoft SMTP Server id 14.1.225.23; Thu, 3 May 2012 22:57:14 +0000
Received: from mail90-db3 (localhost [127.0.0.1])	by mail90-db3-R.bigfish.com (Postfix) with ESMTP id A481746043B; Thu,  3 May 2012 22:57:14 +0000 (UTC)
X-SpamScore: -27
X-BigFish: VS-27(zz9371I936eK98dKzz1202hzz1033IL8275dhz2fh87h2a8h668h839h944hd25he96h)
X-Forefront-Antispam-Report: CIP:131.107.125.8; KIP:(null); UIP:(null); IPV:NLI; H:TK5EX14HUBC106.redmond.corp.microsoft.com; RD:none; EFVD:NLI
Received-SPF: pass (mail90-db3: domain of skype.net designates 131.107.125.8 as permitted sender) client-ip=131.107.125.8; envelope-from=koen.vos@skype.net; helo=TK5EX14HUBC106.redmond.corp.microsoft.com ; icrosoft.com ; 
X-FB-DOMAIN-IP-MATCH: fail
Received: from mail90-db3 (localhost.localdomain [127.0.0.1]) by mail90-db3 (MessageSwitch) id 1336085831958388_11418; Thu,  3 May 2012 22:57:11 +0000 (UTC)
Received: from DB3EHSMHS010.bigfish.com (unknown [10.3.81.241])	by mail90-db3.bigfish.com (Postfix) with ESMTP id DBECD4E005D; Thu,  3 May 2012 22:57:11 +0000 (UTC)
Received: from TK5EX14HUBC106.redmond.corp.microsoft.com (131.107.125.8) by DB3EHSMHS010.bigfish.com (10.3.87.110) with Microsoft SMTP Server (TLS) id 14.1.225.23; Thu, 3 May 2012 22:57:11 +0000
Received: from TK5EX14MBXC253.redmond.corp.microsoft.com ([169.254.3.188]) by TK5EX14HUBC106.redmond.corp.microsoft.com ([157.54.80.61]) with mapi id 14.02.0298.005; Thu, 3 May 2012 22:57:19 +0000
From: Koen Vos <koen.vos@skype.net>
To: Jean-Marc Valin <jmvalin@jmvalin.ca>, "Kevin P. Fleming" <kpfleming@digium.com>
Thread-Topic: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
Thread-Index: AQHNKGkc4Qi6EWhhiEuayjDA4jUg5Ja4rR3D
Date: Thu, 3 May 2012 22:57:19 +0000
Message-ID: <D79146E3783B6942A3E8BC43352BBB460575422C@TK5EX14MBXC253.redmond.corp.microsoft.com>
References: <CAOf0ZB_S+rB20NQCBB=tsO=5iti4HhnEZRTte2pJuz2O36uQ-A@mail.gmail.com> <4FA07CE4.7090206@mozilla.com> <4FA12C35.9000703@digium.com>,<4FA1392E.8040702@jmvalin.ca>
In-Reply-To: <4FA1392E.8040702@jmvalin.ca>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [157.54.51.35]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: microsoft.com
Cc: "codec@ietf.org" <codec@ietf.org>
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-ietf-codec-opus-12.txt>
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 May 2012 22:57:27 -0000

Jean-Marc wrote:=0A=
> Also, fortunately prior art gets established based on the time Silk was a=
vailable even in obfuscated=0A=
form.=0A=
=0A=
The first incarnation of SILK was included in the Skype application in Janu=
ary 2009.  While there are some algorithmic differences with SILK in Opus, =
most of the methods are the same.  You can download the old SILK from the S=
kype developer web site, if you wanted to compare the two.=0A=
=0A=
best,=0A=
koen.=0A=
=0A=
________________________________________=0A=
From: codec-bounces@ietf.org [codec-bounces@ietf.org] on behalf of Jean-Mar=
c Valin [jmvalin@jmvalin.ca]=0A=
Sent: Wednesday, May 02, 2012 6:39 AM=0A=
To: Kevin P. Fleming=0A=
Cc: codec@ietf.org=0A=
Subject: Re: [codec] Authors and disclosures, regarding Last Call: <draft-i=
etf-codec-opus-12.txt>=0A=
=0A=
On 12-05-02 08:44 AM, Kevin P. Fleming wrote:=0A=
> While it's true that SILK was available at that time, it was only=0A=
> available in binary form embedded inside products delivered by Skype,=0A=
> and those products heavily obscure and encrypt their behavior. It seems=
=0A=
> unlikely that anyone who had IPR in any of its methods would have been=0A=
> able to determine that SILK was potentially infringing before its source=
=0A=
> code was made available. I would consider the 'first public disclosure'=
=0A=
> date to be the first time the source code was published.=0A=
=0A=
As far as I can tell, the actual source code was released as open source=0A=
in March 2010, though it may have been available before that to=0A=
developers willing to sign some agreement. Also, fortunately prior art=0A=
gets established based on the time Silk was available even in obfuscated=0A=
form.=0A=
=0A=
        Jean-Marc=0A=
_______________________________________________=0A=
codec mailing list=0A=
codec@ietf.org=0A=
https://www.ietf.org/mailman/listinfo/codec=0A=
=0A=


From rjsparks@nostrum.com  Mon May 14 11:02:36 2012
Return-Path: <rjsparks@nostrum.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8650021F8458 for <codec@ietfa.amsl.com>; Mon, 14 May 2012 11:02:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.299
X-Spam-Level: 
X-Spam-Status: No, score=-101.299 tagged_above=-999 required=5 tests=[AWL=-1.300, BAYES_50=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id orqNBhQclQWv for <codec@ietfa.amsl.com>; Mon, 14 May 2012 11:02:29 -0700 (PDT)
Received: from nostrum.com (nostrum-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:267::2]) by ietfa.amsl.com (Postfix) with ESMTP id BE3FD21F847F for <codec@ietf.org>; Mon, 14 May 2012 11:02:26 -0700 (PDT)
Received: from dn3-177.estacado.net (vicuna-alt.estacado.net [75.53.54.121]) (authenticated bits=0) by nostrum.com (8.14.3/8.14.3) with ESMTP id q4EI2NgO063539 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for <codec@ietf.org>; Mon, 14 May 2012 13:02:24 -0500 (CDT) (envelope-from rjsparks@nostrum.com)
Message-ID: <4FB148AF.7040304@nostrum.com>
Date: Mon, 14 May 2012 13:02:23 -0500
From: Robert Sparks <rjsparks@nostrum.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: codec@ietf.org
References: <1337001184.23527.1544.camel@mightyatom.folly.org.uk>
In-Reply-To: <1337001184.23527.1544.camel@mightyatom.folly.org.uk>
X-Forwarded-Message-Id: <1337001184.23527.1544.camel@mightyatom.folly.org.uk>
Content-Type: multipart/alternative; boundary="------------030501080507040708020806"
Received-SPF: pass (nostrum.com: 75.53.54.121 is authenticated by a trusted mechanism)
Subject: [codec] Fwd: [Gen-art] ***SPAM*** 6.93 (5) Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 May 2012 18:02:36 -0000

This is a multi-part message in MIME format.
--------------030501080507040708020806
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit


-------- Original Message --------
Subject: 	[Gen-art] ***SPAM*** 6.93 (5) Gen-art last call review of 
draft-ietf-codec-opus-12.txt (completed)
Date: 	Mon, 14 May 2012 14:13:04 +0100
From: 	Elwyn Davies <elwynd@folly.org.uk>
Organization: 	Folly Consulting
To: 	General Area Review Team <gen-art@ietf.org>, 
draft-ietf-codec-opus.all@tools.ietf.org
CC: 	IETF discussion <ietf@ietf.org>


I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

Document: draft-ietf-codec-opus-12.txt
Reviewer:  Elwyn Davies
Review Date:  14 May 2012 (completed)
IETF LC End Date: 10 May 2012
IESG Telechat date: (if known) -

Summary:
Before offering some views on the document, let me say that this piece
of work seems to be a tour de force on behalf of its developers.  It is
certainly one of the (if not the) most technically complex pieces of
work that has been presented to the IETF, and is far more mathematically
complex and rigorous than our usual run of standards.  It is also
perhaps a vindication of the rough concensus and running code
philosophy. So congratulations to the authors and the companies that
have supported this work.  But back to the review....

I came to this review with a very outdated view of what goes on inside
codecs, so it was rather a shock to the system.  Taking that into
account, I think that some additional high level, up front description
of the two parts (SILK and CELT) and the range encoding system would
make it easier for future (naive) readers and potential implementers to
understand what is going on.  For example, after struggling with the
description of CELT in the decoder (s4.3) I found that going off and
reading the slides on CELT presented at IETF 79 helped considerably.
Such additions might also help future dissemination of the technique by
giving potential users a quick overview.  It would also make it easier
to justify the decision to postpone the high level (and highly
mathematical/technical) view to section 5 after the detailed description
of the decoder.

I also found that the descriptions of the SILK and CELT decoders
appeared to be written with a different perspective (see detailed
comments) doubtless by different authors.  This is not ideal and,
despite the expected further increase in page count, some additional
text in s4.3 seems desirable.

By far the most difficult to parse section is 4.3.3 talking about bit
allocation in CELT.  Despite considerable study, I didn't feel I had
really got to grips with how this worked.  An example might help.

Finally, the most contentious aspect of this document is the requirement
to treat the code as normative.  There are just over 30,000 physical
lines of code and I haven't checked them hardly at all.  Slocount
reckons this represents 7.14 man years of effort.  As with the document,
there is a discrepancy between CELT and SILK with the latter code being
more heavily commented, especially as regards routine parameters.  A
proper validation of the claim that the code implements the description
would take several weeks of time: I guess that we have to take the
document shepherd's assurances on this.  One issue is that both code and
document are pretty much saturated with what we used to call 'magic
numbers'.  Although these are explained in the document, it does not
appear that this is always the case in the code.  I would also be
happier if the code contained Doxygen (or similar) dcoumentation
statements for all routines (rather than just the API).

So overall, the work is something that ought to be standardized but the
document needs further work - not quite ready for the IESG.

Major issues:
Can we accept the code as normative?  If not how do we proceed?

Minor issues:
Contrast between decoder descriptions of LP part and CELT part:  The
SILK descriptions go into gory detail on the values used in lots of
tables, etc., whereas the CELT part has a very limited treatment of the
numeric values used (assuming reliance on finding the values in the
reference implementation, either explictly or implicitly).  There are
things to be said for both techniques.  I was wondering (while reading
the SILK description) if the authors have any means of automatically
generating the tables from the code in the SILK part (or vice versa) to
avoid double maintenance. On the other hand, there are pieces of the
CELT decoder description (especially in s4.3.3 where knowing numbers of
bands, etc.) where some actual numbers would help comprehension.

s2 (and more generally):  The splitting of the signal in the frequency
domain into signal (components?) below and above 8kHz in the hybrid case
presumably requires that the output of the CELT encoder is windowed so
that only one of encoders gnerates output below 8kHZ.  I think something
is needed in s2 to explain how this is managed (presumably to do with
energy bands?).  I didn't find anything in s5 about what has to be done
for the encoder when running in hybrid mode which surprised me somewhat.

s4.1: Despite having found a copy of the range coding paper and tried to
read it, I found the description of range coding opaque (there are some
more words later in s5 but this is a bit late for understanding the
decoder).  Given that this is (I think) fairly novel, some additional
less opaque description could be very helpful to people trying to
understand what is going on. In particular knowledge of how entropy
coding works is pretty much a prerequisite.

s4.2.5, para 3:
>     When switching from 20 ms to 10 ms, the 10 ms
>     20 ms Opus frame, potentially leaving a hole that needs to be
>     concealed from even a single packet loss.
How?

s4.2.5, para 4:
>  In order to properly produce LBRR frames under all conditions, an
>     encoder might need to buffer up to 60 ms of audio and re-encode it
>     during these transitions.  However, the reference implementation opts
>     to disable LBRR frames at the transition point for simplicity.
>
Should this be phrased in RFC 2119 requiremenmts language?  The first
part sounds like a SHOULD with the second part being the get out , but
its not entirely clear what the consequences are other then simplicity.


======================================================================
             MINOR ISSUES ABOVE submitted as part 1 of review
======================================================================
general/s11.2:  Several references [Hadamard], [Viterbi}, etc., are to
Wikipaedia pages.  Whilst these are convenient (and only illustrative)
they are not guaranteed to be very stable.  Better (i.e., more stable)
references are desirable.

s1, para 2:
>     The decoder contains a great deal of integer and fixed-point
>     arithmetic which must be performed exactly, including all rounding
>     considerations,...
Is this a MUST?  There are instances in the text which might contradict
this (e.g., para 1 of s4.2.7.5 which has (capitalized) SHOULDs).

s4.3:
As a complete newcomer to CELT, I would have appreciated a more high
level understanding of what CELT is doing at this point.  I  tried
reading s4.3 without any additional input and found it very hard going.
Eventually I gave up and went looking for some additional input.  This
presentation seems to have a useful view
http://www.ietf.org/proceedings/79/slides/codec-2.pdf

I think that it would be extremely helpful to have a description similar
to this at this point in the document, even though there is some
material in section 5.3 which could also be forward referenced.  Still
the material in s5.3 does not start from the basic principles that CELT
is using, and since these are essentially novel, it would be very good
to give prospective implementers/users an understanding of what is going
on.  Incidentally, I found the above IETF presentation more useful than
http://www.celt-codec.org/presentations/misc/lca-celt.pdf
Note that the SILK part seems rather less opaque.  It would also be
useful to indicate numerically how many bands are involved and what the
number of MDCT bins are in the various bands.

s4.3, last para: Can the 'out of range error' occur in the LP decoder? (if not why not?)

======================================================================

Nits/editorial comments:

global: bytes ->  octets
global: The form/terminology Q<n>  (e.g., Q13, Q15, Q16) ought to be
explained.

s1: Expand CELP

s1.1.6: Need to define floor().

s2: ? Expand SILK

s2: Reference for Vorbis.

s2.1.8: Expand AAC (and MP3).

s3: References for Ogg, Matroska, maybe RTP.

s3: Fig 1, Table 2 and intervening text:  Presumably SILK only (Table 2)
etc correspond to MODES 1-3 in Figure 1. This needs to be consistent.

s3.2.1: Make it clear which of the octets is len[0]/len[1].  To be
precise it might be better to say len0/len1 are the values of the two
length octets (in whichever order you intend). The form len[0] could be
misinterpreted as a function 'length of 0'.

s3.2.5: better s/figure below/Figure 5/

s3.2.5:
>  In the CBR case, the compressed length of each frame in bytes is
>     equal to the number of remaining bytes in the packet after
>     subtracting the (optional) padding, (N-2-P), divided by M. This
>     number MUST be a non-negative integer multiple of M.
'This number' is not the  compressed length of each frame that is the
subject of the first sentence, but the number of remaining octets - this
needs rewording.

s3.2.5:
>  The number of header bytes (TOC byte, frame
>     count byte, padding length bytes, and frame length bytes), plus the
>     length of the first M-1 frames themselves, plus the length of the
>     padding MUST be no larger than N, the total size of the packet.
Surely this is a non sequitur? This might be better phrased as 'The
total size of a well formed packet MUST be at least...'

s3.3: The example diagrams ought to have figure numbers.

s3.4: I am not keen on duplicating normative requirements in this way
(double maintenance issue).  It would be better to put explicit numbered
requirements in the sections above an reference the resulting numbers
here.

s4.1:
>  The decoder initializes rng to 128 and initializes val to
>     127 minus the top 7 bits of the first input octet.
How are the 'top seven bits' to be interpreted here? e.g. as the bottom
seven bits of a 8 bit integer field? an 8 bit integer with the lowest
bit zeroed out?


s4.1.1:  This is really a global point.  This section refers to
entdec.c.  Presumably (since we haven't reached the code yet) and it is
still compressed, there is some file structure.  I don't think this has
been said above.  It would be good to provide a list of the file
components (i.e., sectional structure of the code) at the start, maybe
even  giving line number positions within the decompressed code.


s4.1.1.1:
>  Then it reads the next octet of the
>     payload and combines it with the left-over bit buffered from the
>     previous octet to form the 8-bit value sym.  It takes the left-over
>     bit as the high bit (bit 7) of sym, and the top 7 bits of the octet
>     it just read as the other 7 bits of sym.
This is not well phrased.  Better
      Then it reads the next octet of the payload [packet? payload hasn't
really been used before] and combines the left  over bit from the
previous octet (see Section 4.1 for starting this process) as the high
bit (bit 7)| of 'sym' and the top 7 bits of the octet as the other 7
bits of sym, leaving the remaining bit for the next iteration.

s4.1.5.2:
Should r_Q15 = rng>>  (l-16) be r_Q15 = rng>>  (lg-16)?  There doesn't
seem to be an 'l' defined.

s4.2.1: Expand LTP earlier. It would also be useful to expand LPC again.

s4.2.2: acronym VAD is not expanded until the beginning of s4.2.3.

s4.2.7: acronym LSF needs to be expanded on first use.

s4.2.7.1: Explain briefly why Table 7 has values for indices 0 to 15
when wi0/1 are in range 0 to 14.

s4.2.7.4, para below Table 12:
>  These 6 bits are combined to form a gain index between 0 and 63.

s/gain index/gain_index/ as this variable is used subsequently.

s4.2.7.4: The use of log_gain seems slightly confusing when combined
with gain_index.  One at least is presumably log scaled.  Maybe a bit
more explanation is needed.

======================================================================
             COMMENTS ABOVE submitted as part 1 of review
======================================================================
General: Define L1 and L2 (as in L1-norm, etc).

s4.2.7.2, last para:
>  In that case, if this
>     flag is zero (indicating that there should be a side channel), then
>     Packet Loss Concealment (PLC, see Section 4.4) SHOULD be invoked to
>     recover a side channel signal.
What are the consequences (or what actions need to be taken) if it is
not invoked?

s4.2.7.5, para 1:
>  These represent the interleaved zeros on the
>     unit circle between 0 and pi (hence "normalized") in the standard
>     decomposition of the LPC filter into a symmetric part and an anti-
>     symmetric part (P and Q in Section 4.2.7.5.6).
'on the unit circle between 0 and pi' might be clearer as 'on the upper
half of the unit circle' or 'on the half of the unit circle in the
positive imaginary area of the complex plane'.
'standard decomposition'?  Needs a reference.

s4.2.7.5, para 1: A reference for the use of LSF in LPC would be useful.

s4.2.7.5.x: There is inconsistent use of stage 1/stage 2 vs
stage-1/stage-2.  Please be consistent.

s4.2.7.5, para 1:
>     Because of non-linear
>     effects in the decoding process, an implementation SHOULD match the
>     fixed-point arithmetic described in this section exactly.  An encoder
>     SHOULD also use the same process.
- Does this contradict the 'must' in s1, para 2?
- What are the consequences of ignoring the SHOULD?  How bad would they
get?  Might it become unstable and how would one know?

s4.2.7.5.1, para 1: s/This indexes an element in a coarse codebook,
    selects the PDFs for the second stage of the VQ/This indexes an
    element in a coarse codebook that selects the PDFs for the second stage
    of the VQ/

s4.2.7.5.3, last para:
>  However, nothing in
>     either the reconstruction process or the quantization process in the
>     encoder thus far guarantees that the coefficients are monotonically
>     increasing and separated well enough to ensure a stable filter.
A reference that indicates why this requirement is needed would be desirable.
(and also for s4.2.5.7.8).

s4.2.7.5.4 and Table 25: Are the values in Table 25 NDeltaMin or NDelatMin_Q15?
The equations after Table 25 use both NDeltaMin and NDeltaMin_Q15.  Is this correct?
In particular the first two equations deliver _Q15 values but use raw NDeltaMin.

s4.1.1/s4.2.7.1 and other places:  The term 'exact integer division' is
used in various places.  My understanding was that this phrase implied
that it was known that the dividend was an exact multiple of the divisor
by some out-of-band means.  This doesn't seem to be the case generally
in Opus (e.g,, where both n/5 and n%5  are needed - clearly this doesn't
anticipate n%5 == 0 every time!)  So what does 'exact integer division'
imply?  A definition may be needed.

s4.3, last para: s/described in the figure above./described in Table 55 above./

s4.3.1:
>  The "transient" flag encoded in the bitstream has a probability of 1/8.
This statement appears out of the blue apparently.  Some more
explanation of what the transient flag actually implies and why we
should be so sure about its PDF would help.

s4.3.2.1: Arguably a reference is needed for the z-transform.

s4.3.2.1: Avoid the equation picture splitting across page boundaries.
in the current version it is unclear what the denominator is. (use
needLines processing direcrive in xml2rfc).  Same applies to the
equation below Table 57 in s4.3.4.3.

4.3.2.1, after the equations:
>  The
>     prediction is clamped internally so that fixed point implementations
>     with limited dynamic range do not suffer desynchronization.
As a person with limited skills in the srt, I have no idea what
desynchronization implies here.

4.3.2.1, ibid:
>  We
>     approximate the ideal probability distribution of the prediction
>     error using a Laplace distribution with separate parameters for each
>     frame size in intra- and inter-frame modes.
I suspect this sentence belongs before the equation described the z-transform.
Where are the values of the parameters for the inter-frame mode defined
(the intra-frame ones are in the text)?

s4.3.2.3: Paragraph on decoding band boosts:  Might be improved by using
equations rather than the wordy descriptions used at present.

(global)s4.3.2.3, para above table 56: s/iff/if and only if/

s4.3.2.3: LOG2_FRAC_TABLE is missing.

s4.3.3: It would be helpful to explain either here, or at the outset of
s4.3 overall, how the concept of energy bands and MDCT bins applies to
the CELT part of the codec, and just how many bands and bins are used.
Some of this is contained in s5.3.2, but the magic number 17 appears
later in 4.3.3 which is presumably something to do with the point in the
frequency domain that CELT takes over from LP in the hybrid mode.  It
would make the very complex section 4.3.3 rather easier to understand
with this extra information - I have to say I struggled!  On reflection,
I think an example of what bits are allocated to a band and how thay rae
subsequently used would be quite helpful - Without going to delve into
the code I am really not clear that I understand just what bits are
allocated and what they then encode and I have read the text quite a few
times now.

s4.3.3: Be consistent between 'tone to noise' and 'tone-to-noise'.

s4.3.3:
>     The band-energy normalized structure of Opus MDCT mode ensures that a
>     constant bit allocation for the shape content of a band will result
>     in a roughly constant tone to noise ratio, which provides for fairly
>     consistent perceptual performance.  The effectiveness of this
>     approach is the result of two factors: that the band energy, which is
>     understood to be perceptually important on its own, is always
>     preserved regardless of the shape precision, and because the constant
>     tone-to-noise ratio implies a constant intra-band noise to masking
>     ratio.  Intra-band masking is the strongest of the perceptual masking
>     effects.  This structure means that the ideal allocation is more
>     consistent from frame to frame than it is for other codecs without an
>     equivalent structure.
This paragraph contains a number of interesting assertions:  Is there a
reference where one could see them justified (it may be that this is the
result of original research in the Opus team).

s4.3.3, paragraphs after the bullet points:  The concepts of 'shape' and
'shape encoding' is introcuced here without explicit definition.  Are we
talking about the shape windowing used in FFT/MDCT here? This should be
made clear.

s4.3.3, 5th para after bullet points: s/In the reference the maximums/In
the reference implementation the maximums/

s4.3.3, 6th para after the bullet points:  A table of bands per mode and
number of MDCT bins  covered would be helpful here in order to  get a
feeling for the scale of the problem.  Also the cache_caps50 table in
the code contains the magic number 168.  Where does this come from?

s4.3.3, 6th para after the bullet points:
>  set LM to the shift value for the frame size (e.g. 0 for 120, 1 for
>     240, 3 for 480),
Where do these frame sizes get specified? And what is the total set of
frame sizes? The text says 'e.g.' (which incidentally should be 'e.g.,')
implying that this is not the complete set.

s4.3.3, 6th para after the bullet points: Need to define 'truncating
integer division' to go with 'exact integer division'.

s4.3.3, 7th para after the bullet points:
>  The band boosts are represented by a series of binary symbols which
>     are coded with very low probability.
How many, at least, and what values?  Are these range encoded? I don't
see them in the table above or with a PDF specified.

s4.3.3, 7th para after the bullet points:
>     and every time
>     a band is boosted the initial cost is reduced (down to a minimum of
>     two).
Would that be a value of two or two bits?

s4.3.3: Paragraph on decoding band boosts:  Might be improved by using
equations rather than the wordy descriptions used at present.

(global)s4.3.3, para above table 56: s/iff/if and only if/

s4.3.3, 2nd para after Table 56:
>  For stereo frames, bits are reserved for intensity stereo and for
>     dual stereo.  Intensity stereo requires ilog2(end-start) bits.
The terms 'intenmsity stereo' and 'dual stereo' don't appear to have
been defined.

s4.3.3: LOG2_FRAC_TABLE is missing.

4.3.4.3, last para:
>     If the decoded vector represents more than one time block, then the
>     following process is applied separately on each time block.
Should this sentence come before the previous paragraph?  There  isn't
really a 'following process' in this section and I don't think it menas
the process in s4.3.4.4?


s4.3.4.3, last sentence:
>  This extra rotation is applied in an interleaved manner with a stride
>     equal to round(sqrt(N/nb_blocks))
I think this needs some more explanation for the uninitiated.

s4.3.4.4:
>  Multiple levels of splitting may be
>     applied up to a frame size dependent limit.
What this limit is does not appear to be defined.

s4.3.5: The 'collapse' phenomenon is not fully defined, and it would be
useful to mention why it happens.  Also s/min/minimum/.

s4.3.6: s/Just like/Just as/

s4.3.7, last para: I think 'power complementarity' requires further
explanation or a reference.

s4.5, para 3: s/To avoid or reduces glitches during these/To avoid or
reduce glitches during these/

s4.5.1.1, para 1: s/For for SILK-only/For SILK-only/

s4.5.1.4, para 2: s/redundant frame is as-is,/redundant frame as-is,/

s5, figure 16: The Optional High-pass Filter box has two spurious '+'
symbols on the vertical sides.

s5, last para:  A reference for the Auto Regressive Moving Average
(ARMA) filter would be useful.

s5.2.3.4.2.1, title: s/Burgs method/Burg's Method/

s5.2.3.5, para 2: Expand 'R/D performance' (or probably specify it as
abbreviation for rate-distortion in para 1).

s5.3.5, para 1 below equation: Is E an abbreviation for 'extra'?

s5.3.6: The abbreviation RD for rate-distortion is defined here (see
comment on s5.2.3.5).

s6.1: This section is perhaps a little 'triumphalist' for the reference
implementation (this may of course be justified!.  The quality metric is
a 'relative quality metric' and presumably if someone does a *really*
good job of coding, it is entirely possible that the new algorithms
might get better than 100 on the quality score (i.e., provide a better
answer than the reference implementation).

s6.2: Just wondering, but is non-standard frame size the only option
offered by Opus Custom?  If not, probably more text is needed here.  Are
there any major changes to the algorithms implied by the use of Opus
Custom?

s7.  Referencing SECGUIDE (RFC 3552) seems inappropriate since it occurs
in such a security considerations section. Just omit it.

s11.2: A number of references are to Wikipaedia pages.  While these were
useful to me in refreshing or initializing my knowledge, they are not
usually considered adequately stable for use in RFCs.  I fear you may
have to provide more stable references.

Appendix A: I checked that the code extracted as inicated and was able
to be compiled under Ubuntu 10.04 LTS.


_______________________________________________
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


--------------030501080507040708020806
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <br>
    -------- Original Message --------
    <table class="moz-email-headers-table" border="0" cellpadding="0"
      cellspacing="0">
      <tbody>
        <tr>
          <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Subject: </th>
          <td>[Gen-art] ***SPAM*** 6.93 (5) Gen-art last call review of
            draft-ietf-codec-opus-12.txt (completed)</td>
        </tr>
        <tr>
          <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Date: </th>
          <td>Mon, 14 May 2012 14:13:04 +0100</td>
        </tr>
        <tr>
          <th align="RIGHT" nowrap="nowrap" valign="BASELINE">From: </th>
          <td>Elwyn Davies <a class="moz-txt-link-rfc2396E" href="mailto:elwynd@folly.org.uk">&lt;elwynd@folly.org.uk&gt;</a></td>
        </tr>
        <tr>
          <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Organization:
          </th>
          <td>Folly Consulting</td>
        </tr>
        <tr>
          <th align="RIGHT" nowrap="nowrap" valign="BASELINE">To: </th>
          <td>General Area Review Team <a class="moz-txt-link-rfc2396E" href="mailto:gen-art@ietf.org">&lt;gen-art@ietf.org&gt;</a>,
            <a class="moz-txt-link-abbreviated" href="mailto:draft-ietf-codec-opus.all@tools.ietf.org">draft-ietf-codec-opus.all@tools.ietf.org</a></td>
        </tr>
        <tr>
          <th align="RIGHT" nowrap="nowrap" valign="BASELINE">CC: </th>
          <td>IETF discussion <a class="moz-txt-link-rfc2396E" href="mailto:ietf@ietf.org">&lt;ietf@ietf.org&gt;</a></td>
        </tr>
      </tbody>
    </table>
    <br>
    <br>
    <pre>I am the assigned Gen-ART reviewer for this draft. For background on 
Gen-ART, please see the FAQ at 
<a class="moz-txt-link-rfc2396E" href="http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq">&lt;http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq&gt;</a>. 

Please resolve these comments along with any other Last Call comments 
you may receive. 

Document: draft-ietf-codec-opus-12.txt
Reviewer:  Elwyn Davies
Review Date:  14 May 2012 (completed)
IETF LC End Date: 10 May 2012
IESG Telechat date: (if known) -

Summary: 
Before offering some views on the document, let me say that this piece
of work seems to be a tour de force on behalf of its developers.  It is
certainly one of the (if not the) most technically complex pieces of
work that has been presented to the IETF, and is far more mathematically
complex and rigorous than our usual run of standards.  It is also
perhaps a vindication of the rough concensus and running code
philosophy. So congratulations to the authors and the companies that
have supported this work.  But back to the review....

I came to this review with a very outdated view of what goes on inside
codecs, so it was rather a shock to the system.  Taking that into
account, I think that some additional high level, up front description
of the two parts (SILK and CELT) and the range encoding system would
make it easier for future (naive) readers and potential implementers to
understand what is going on.  For example, after struggling with the
description of CELT in the decoder (s4.3) I found that going off and
reading the slides on CELT presented at IETF 79 helped considerably.
Such additions might also help future dissemination of the technique by
giving potential users a quick overview.  It would also make it easier
to justify the decision to postpone the high level (and highly
mathematical/technical) view to section 5 after the detailed description
of the decoder.

I also found that the descriptions of the SILK and CELT decoders
appeared to be written with a different perspective (see detailed
comments) doubtless by different authors.  This is not ideal and,
despite the expected further increase in page count, some additional
text in s4.3 seems desirable.

By far the most difficult to parse section is 4.3.3 talking about bit
allocation in CELT.  Despite considerable study, I didn't feel I had
really got to grips with how this worked.  An example might help.

Finally, the most contentious aspect of this document is the requirement
to treat the code as normative.  There are just over 30,000 physical
lines of code and I haven't checked them hardly at all.  Slocount
reckons this represents 7.14 man years of effort.  As with the document,
there is a discrepancy between CELT and SILK with the latter code being
more heavily commented, especially as regards routine parameters.  A
proper validation of the claim that the code implements the description
would take several weeks of time: I guess that we have to take the
document shepherd's assurances on this.  One issue is that both code and
document are pretty much saturated with what we used to call 'magic
numbers'.  Although these are explained in the document, it does not
appear that this is always the case in the code.  I would also be
happier if the code contained Doxygen (or similar) dcoumentation
statements for all routines (rather than just the API).

So overall, the work is something that ought to be standardized but the
document needs further work - not quite ready for the IESG. 

Major issues: 
Can we accept the code as normative?  If not how do we proceed?

Minor issues:
Contrast between decoder descriptions of LP part and CELT part:  The
SILK descriptions go into gory detail on the values used in lots of
tables, etc., whereas the CELT part has a very limited treatment of the
numeric values used (assuming reliance on finding the values in the
reference implementation, either explictly or implicitly).  There are
things to be said for both techniques.  I was wondering (while reading
the SILK description) if the authors have any means of automatically
generating the tables from the code in the SILK part (or vice versa) to
avoid double maintenance. On the other hand, there are pieces of the
CELT decoder description (especially in s4.3.3 where knowing numbers of
bands, etc.) where some actual numbers would help comprehension.

s2 (and more generally):  The splitting of the signal in the frequency
domain into signal (components?) below and above 8kHz in the hybrid case
presumably requires that the output of the CELT encoder is windowed so
that only one of encoders gnerates output below 8kHZ.  I think something
is needed in s2 to explain how this is managed (presumably to do with
energy bands?).  I didn't find anything in s5 about what has to be done
for the encoder when running in hybrid mode which surprised me somewhat.

s4.1: Despite having found a copy of the range coding paper and tried to
read it, I found the description of range coding opaque (there are some
more words later in s5 but this is a bit late for understanding the
decoder).  Given that this is (I think) fairly novel, some additional
less opaque description could be very helpful to people trying to
understand what is going on. In particular knowledge of how entropy
coding works is pretty much a prerequisite. 

s4.2.5, para 3:
&gt;    When switching from 20 ms to 10 ms, the 10 ms
&gt;    20 ms Opus frame, potentially leaving a hole that needs to be
&gt;    concealed from even a single packet loss.
How?

s4.2.5, para 4: 
&gt; In order to properly produce LBRR frames under all conditions, an
&gt;    encoder might need to buffer up to 60 ms of audio and re-encode it
&gt;    during these transitions.  However, the reference implementation opts
&gt;    to disable LBRR frames at the transition point for simplicity.
&gt; 
Should this be phrased in RFC 2119 requiremenmts language?  The first
part sounds like a SHOULD with the second part being the get out , but
its not entirely clear what the consequences are other then simplicity.


======================================================================
            MINOR ISSUES ABOVE submitted as part 1 of review
======================================================================
general/s11.2:  Several references [Hadamard], [Viterbi}, etc., are to
Wikipaedia pages.  Whilst these are convenient (and only illustrative)
they are not guaranteed to be very stable.  Better (i.e., more stable)
references are desirable.

s1, para 2:
&gt;    The decoder contains a great deal of integer and fixed-point
&gt;    arithmetic which must be performed exactly, including all rounding
&gt;    considerations,...
Is this a MUST?  There are instances in the text which might contradict
this (e.g., para 1 of s4.2.7.5 which has (capitalized) SHOULDs).

s4.3:
As a complete newcomer to CELT, I would have appreciated a more high
level understanding of what CELT is doing at this point.  I  tried
reading s4.3 without any additional input and found it very hard going.
Eventually I gave up and went looking for some additional input.  This
presentation seems to have a useful view 
<a class="moz-txt-link-freetext" href="http://www.ietf.org/proceedings/79/slides/codec-2.pdf">http://www.ietf.org/proceedings/79/slides/codec-2.pdf</a>

I think that it would be extremely helpful to have a description similar
to this at this point in the document, even though there is some
material in section 5.3 which could also be forward referenced.  Still
the material in s5.3 does not start from the basic principles that CELT
is using, and since these are essentially novel, it would be very good
to give prospective implementers/users an understanding of what is going
on.  Incidentally, I found the above IETF presentation more useful than
<a class="moz-txt-link-freetext" href="http://www.celt-codec.org/presentations/misc/lca-celt.pdf">http://www.celt-codec.org/presentations/misc/lca-celt.pdf</a>
Note that the SILK part seems rather less opaque.  It would also be
useful to indicate numerically how many bands are involved and what the
number of MDCT bins are in the various bands. 

s4.3, last para: Can the 'out of range error' occur in the LP decoder? (if not why not?)

======================================================================

Nits/editorial comments: 

global: bytes -&gt; octets
global: The form/terminology Q&lt;n&gt; (e.g., Q13, Q15, Q16) ought to be
explained.

s1: Expand CELP

s1.1.6: Need to define floor().

s2: ? Expand SILK

s2: Reference for Vorbis.

s2.1.8: Expand AAC (and MP3).

s3: References for Ogg, Matroska, maybe RTP.

s3: Fig 1, Table 2 and intervening text:  Presumably SILK only (Table 2)
etc correspond to MODES 1-3 in Figure 1. This needs to be consistent.

s3.2.1: Make it clear which of the octets is len[0]/len[1].  To be
precise it might be better to say len0/len1 are the values of the two
length octets (in whichever order you intend). The form len[0] could be
misinterpreted as a function 'length of 0'.

s3.2.5: better s/figure below/Figure 5/

s3.2.5:
&gt; In the CBR case, the compressed length of each frame in bytes is
&gt;    equal to the number of remaining bytes in the packet after
&gt;    subtracting the (optional) padding, (N-2-P), divided by M. This
&gt;    number MUST be a non-negative integer multiple of M.
'This number' is not the  compressed length of each frame that is the
subject of the first sentence, but the number of remaining octets - this
needs rewording.

s3.2.5:
&gt; The number of header bytes (TOC byte, frame
&gt;    count byte, padding length bytes, and frame length bytes), plus the
&gt;    length of the first M-1 frames themselves, plus the length of the
&gt;    padding MUST be no larger than N, the total size of the packet.
Surely this is a non sequitur? This might be better phrased as 'The
total size of a well formed packet MUST be at least...' 

s3.3: The example diagrams ought to have figure numbers.

s3.4: I am not keen on duplicating normative requirements in this way
(double maintenance issue).  It would be better to put explicit numbered
requirements in the sections above an reference the resulting numbers
here. 

s4.1:
&gt; The decoder initializes rng to 128 and initializes val to
&gt;    127 minus the top 7 bits of the first input octet. 
How are the 'top seven bits' to be interpreted here? e.g. as the bottom
seven bits of a 8 bit integer field? an 8 bit integer with the lowest
bit zeroed out?


s4.1.1:  This is really a global point.  This section refers to
entdec.c.  Presumably (since we haven't reached the code yet) and it is
still compressed, there is some file structure.  I don't think this has
been said above.  It would be good to provide a list of the file
components (i.e., sectional structure of the code) at the start, maybe
even  giving line number positions within the decompressed code.


s4.1.1.1:
&gt; Then it reads the next octet of the
&gt;    payload and combines it with the left-over bit buffered from the
&gt;    previous octet to form the 8-bit value sym.  It takes the left-over
&gt;    bit as the high bit (bit 7) of sym, and the top 7 bits of the octet
&gt;    it just read as the other 7 bits of sym.
This is not well phrased.  Better
     Then it reads the next octet of the payload [packet? payload hasn't
really been used before] and combines the left  over bit from the
previous octet (see Section 4.1 for starting this process) as the high
bit (bit 7)| of 'sym' and the top 7 bits of the octet as the other 7
bits of sym, leaving the remaining bit for the next iteration.  

s4.1.5.2:
Should r_Q15 = rng &gt;&gt; (l-16) be r_Q15 = rng &gt;&gt; (lg-16)?  There doesn't
seem to be an 'l' defined.

s4.2.1: Expand LTP earlier. It would also be useful to expand LPC again.
 
s4.2.2: acronym VAD is not expanded until the beginning of s4.2.3.

s4.2.7: acronym LSF needs to be expanded on first use.

s4.2.7.1: Explain briefly why Table 7 has values for indices 0 to 15
when wi0/1 are in range 0 to 14.

s4.2.7.4, para below Table 12:
&gt; These 6 bits are combined to form a gain index between 0 and 63.

s/gain index/gain_index/ as this variable is used subsequently.

s4.2.7.4: The use of log_gain seems slightly confusing when combined
with gain_index.  One at least is presumably log scaled.  Maybe a bit
more explanation is needed.

======================================================================
            COMMENTS ABOVE submitted as part 1 of review
======================================================================
General: Define L1 and L2 (as in L1-norm, etc).

s4.2.7.2, last para:
&gt; In that case, if this
&gt;    flag is zero (indicating that there should be a side channel), then
&gt;    Packet Loss Concealment (PLC, see Section 4.4) SHOULD be invoked to
&gt;    recover a side channel signal.
What are the consequences (or what actions need to be taken) if it is
not invoked?

s4.2.7.5, para 1:
&gt; These represent the interleaved zeros on the
&gt;    unit circle between 0 and pi (hence "normalized") in the standard
&gt;    decomposition of the LPC filter into a symmetric part and an anti-
&gt;    symmetric part (P and Q in Section 4.2.7.5.6).
'on the unit circle between 0 and pi' might be clearer as 'on the upper
half of the unit circle' or 'on the half of the unit circle in the
positive imaginary area of the complex plane'.
'standard decomposition'?  Needs a reference.

s4.2.7.5, para 1: A reference for the use of LSF in LPC would be useful.

s4.2.7.5.x: There is inconsistent use of stage 1/stage 2 vs
stage-1/stage-2.  Please be consistent.

s4.2.7.5, para 1:
&gt;    Because of non-linear
&gt;    effects in the decoding process, an implementation SHOULD match the
&gt;    fixed-point arithmetic described in this section exactly.  An encoder
&gt;    SHOULD also use the same process.
- Does this contradict the 'must' in s1, para 2?
- What are the consequences of ignoring the SHOULD?  How bad would they
get?  Might it become unstable and how would one know?

s4.2.7.5.1, para 1: s/This indexes an element in a coarse codebook,
   selects the PDFs for the second stage of the VQ/This indexes an 
   element in a coarse codebook that selects the PDFs for the second stage 
   of the VQ/

s4.2.7.5.3, last para: 
&gt; However, nothing in
&gt;    either the reconstruction process or the quantization process in the
&gt;    encoder thus far guarantees that the coefficients are monotonically
&gt;    increasing and separated well enough to ensure a stable filter.
A reference that indicates why this requirement is needed would be desirable.
(and also for s4.2.5.7.8).

s4.2.7.5.4 and Table 25: Are the values in Table 25 NDeltaMin or NDelatMin_Q15?
The equations after Table 25 use both NDeltaMin and NDeltaMin_Q15.  Is this correct?
In particular the first two equations deliver _Q15 values but use raw NDeltaMin.

s4.1.1/s4.2.7.1 and other places:  The term 'exact integer division' is
used in various places.  My understanding was that this phrase implied
that it was known that the dividend was an exact multiple of the divisor
by some out-of-band means.  This doesn't seem to be the case generally
in Opus (e.g,, where both n/5 and n%5  are needed - clearly this doesn't
anticipate n%5 == 0 every time!)  So what does 'exact integer division'
imply?  A definition may be needed. 

s4.3, last para: s/described in the figure above./described in Table 55 above./

s4.3.1: 
&gt; The "transient" flag encoded in the bitstream has a probability of 1/8. 
This statement appears out of the blue apparently.  Some more
explanation of what the transient flag actually implies and why we
should be so sure about its PDF would help.

s4.3.2.1: Arguably a reference is needed for the z-transform.

s4.3.2.1: Avoid the equation picture splitting across page boundaries.
in the current version it is unclear what the denominator is. (use
needLines processing direcrive in xml2rfc).  Same applies to the
equation below Table 57 in s4.3.4.3.

4.3.2.1, after the equations:
&gt; The
&gt;    prediction is clamped internally so that fixed point implementations
&gt;    with limited dynamic range do not suffer desynchronization.
As a person with limited skills in the srt, I have no idea what
desynchronization implies here. 

4.3.2.1, ibid:
&gt; We
&gt;    approximate the ideal probability distribution of the prediction
&gt;    error using a Laplace distribution with separate parameters for each
&gt;    frame size in intra- and inter-frame modes.
I suspect this sentence belongs before the equation described the z-transform.
Where are the values of the parameters for the inter-frame mode defined
(the intra-frame ones are in the text)?

s4.3.2.3: Paragraph on decoding band boosts:  Might be improved by using
equations rather than the wordy descriptions used at present.

(global)s4.3.2.3, para above table 56: s/iff/if and only if/

s4.3.2.3: LOG2_FRAC_TABLE is missing.

s4.3.3: It would be helpful to explain either here, or at the outset of
s4.3 overall, how the concept of energy bands and MDCT bins applies to
the CELT part of the codec, and just how many bands and bins are used.
Some of this is contained in s5.3.2, but the magic number 17 appears
later in 4.3.3 which is presumably something to do with the point in the
frequency domain that CELT takes over from LP in the hybrid mode.  It
would make the very complex section 4.3.3 rather easier to understand
with this extra information - I have to say I struggled!  On reflection,
I think an example of what bits are allocated to a band and how thay rae
subsequently used would be quite helpful - Without going to delve into
the code I am really not clear that I understand just what bits are
allocated and what they then encode and I have read the text quite a few
times now.

s4.3.3: Be consistent between 'tone to noise' and 'tone-to-noise'.

s4.3.3:
&gt;    The band-energy normalized structure of Opus MDCT mode ensures that a
&gt;    constant bit allocation for the shape content of a band will result
&gt;    in a roughly constant tone to noise ratio, which provides for fairly
&gt;    consistent perceptual performance.  The effectiveness of this
&gt;    approach is the result of two factors: that the band energy, which is
&gt;    understood to be perceptually important on its own, is always
&gt;    preserved regardless of the shape precision, and because the constant
&gt;    tone-to-noise ratio implies a constant intra-band noise to masking
&gt;    ratio.  Intra-band masking is the strongest of the perceptual masking
&gt;    effects.  This structure means that the ideal allocation is more
&gt;    consistent from frame to frame than it is for other codecs without an
&gt;    equivalent structure.
This paragraph contains a number of interesting assertions:  Is there a
reference where one could see them justified (it may be that this is the
result of original research in the Opus team).
 
s4.3.3, paragraphs after the bullet points:  The concepts of 'shape' and
'shape encoding' is introcuced here without explicit definition.  Are we
talking about the shape windowing used in FFT/MDCT here? This should be
made clear.

s4.3.3, 5th para after bullet points: s/In the reference the maximums/In
the reference implementation the maximums/

s4.3.3, 6th para after the bullet points:  A table of bands per mode and
number of MDCT bins  covered would be helpful here in order to  get a
feeling for the scale of the problem.  Also the cache_caps50 table in
the code contains the magic number 168.  Where does this come from?

s4.3.3, 6th para after the bullet points:
&gt; set LM to the shift value for the frame size (e.g. 0 for 120, 1 for
&gt;    240, 3 for 480),  
Where do these frame sizes get specified? And what is the total set of
frame sizes? The text says 'e.g.' (which incidentally should be 'e.g.,')
implying that this is not the complete set.

s4.3.3, 6th para after the bullet points: Need to define 'truncating
integer division' to go with 'exact integer division'.

s4.3.3, 7th para after the bullet points: 
&gt; The band boosts are represented by a series of binary symbols which
&gt;    are coded with very low probability.
How many, at least, and what values?  Are these range encoded? I don't
see them in the table above or with a PDF specified.

s4.3.3, 7th para after the bullet points:
&gt;    and every time
&gt;    a band is boosted the initial cost is reduced (down to a minimum of
&gt;    two).
Would that be a value of two or two bits? 
  
s4.3.3: Paragraph on decoding band boosts:  Might be improved by using
equations rather than the wordy descriptions used at present.

(global)s4.3.3, para above table 56: s/iff/if and only if/

s4.3.3, 2nd para after Table 56: 
&gt; For stereo frames, bits are reserved for intensity stereo and for
&gt;    dual stereo.  Intensity stereo requires ilog2(end-start) bits.
The terms 'intenmsity stereo' and 'dual stereo' don't appear to have
been defined.

s4.3.3: LOG2_FRAC_TABLE is missing.

4.3.4.3, last para:
&gt;    If the decoded vector represents more than one time block, then the
&gt;    following process is applied separately on each time block. 
Should this sentence come before the previous paragraph?  There  isn't
really a 'following process' in this section and I don't think it menas
the process in s4.3.4.4?


s4.3.4.3, last sentence: 
&gt; This extra rotation is applied in an interleaved manner with a stride
&gt;    equal to round(sqrt(N/nb_blocks))
I think this needs some more explanation for the uninitiated.

s4.3.4.4:
&gt; Multiple levels of splitting may be
&gt;    applied up to a frame size dependent limit. 
What this limit is does not appear to be defined.

s4.3.5: The 'collapse' phenomenon is not fully defined, and it would be
useful to mention why it happens.  Also s/min/minimum/.

s4.3.6: s/Just like/Just as/

s4.3.7, last para: I think 'power complementarity' requires further
explanation or a reference.

s4.5, para 3: s/To avoid or reduces glitches during these/To avoid or
reduce glitches during these/

s4.5.1.1, para 1: s/For for SILK-only/For SILK-only/

s4.5.1.4, para 2: s/redundant frame is as-is,/redundant frame as-is,/

s5, figure 16: The Optional High-pass Filter box has two spurious '+'
symbols on the vertical sides. 

s5, last para:  A reference for the Auto Regressive Moving Average
(ARMA) filter would be useful.

s5.2.3.4.2.1, title: s/Burgs method/Burg's Method/

s5.2.3.5, para 2: Expand 'R/D performance' (or probably specify it as
abbreviation for rate-distortion in para 1).

s5.3.5, para 1 below equation: Is E an abbreviation for 'extra'?

s5.3.6: The abbreviation RD for rate-distortion is defined here (see
comment on s5.2.3.5).

s6.1: This section is perhaps a little 'triumphalist' for the reference
implementation (this may of course be justified!.  The quality metric is
a 'relative quality metric' and presumably if someone does a *really*
good job of coding, it is entirely possible that the new algorithms
might get better than 100 on the quality score (i.e., provide a better
answer than the reference implementation).

s6.2: Just wondering, but is non-standard frame size the only option
offered by Opus Custom?  If not, probably more text is needed here.  Are
there any major changes to the algorithms implied by the use of Opus
Custom?

s7.  Referencing SECGUIDE (RFC 3552) seems inappropriate since it occurs
in such a security considerations section. Just omit it.

s11.2: A number of references are to Wikipaedia pages.  While these were
useful to me in refreshing or initializing my knowledge, they are not
usually considered adequately stable for use in RFCs.  I fear you may
have to provide more stable references.

Appendix A: I checked that the code extracted as inicated and was able
to be compiled under Ubuntu 10.04 LTS.


_______________________________________________
Gen-art mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gen-art@ietf.org">Gen-art@ietf.org</a>
<a class="moz-txt-link-freetext" href="https://www.ietf.org/mailman/listinfo/gen-art">https://www.ietf.org/mailman/listinfo/gen-art</a>
</pre>
  </body>
</html>

--------------030501080507040708020806--

From internet-drafts@ietf.org  Tue May 15 17:24:41 2012
Return-Path: <internet-drafts@ietf.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5141D11E80CC; Tue, 15 May 2012 17:24:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.543
X-Spam-Level: 
X-Spam-Status: No, score=-102.543 tagged_above=-999 required=5 tests=[AWL=0.056, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3g+3Tf+Ar9Zi; Tue, 15 May 2012 17:24:40 -0700 (PDT)
Received: from ietfa.amsl.com (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E624811E8095; Tue, 15 May 2012 17:24:40 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: internet-drafts@ietf.org
To: i-d-announce@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 4.02
Message-ID: <20120516002440.18768.91498.idtracker@ietfa.amsl.com>
Date: Tue, 15 May 2012 17:24:40 -0700
Cc: codec@ietf.org
Subject: [codec] I-D Action: draft-ietf-codec-opus-13.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 May 2012 00:24:41 -0000

A New Internet-Draft is available from the on-line Internet-Drafts director=
ies. This draft is a work item of the Internet Wideband Audio Codec Working=
 Group of the IETF.

	Title           : Definition of the Opus Audio Codec
	Author(s)       : Jean-Marc Valin
                          Koen Vos
                          Timothy B. Terriberry
	Filename        : draft-ietf-codec-opus-13.txt
	Pages           : 329
	Date            : 2012-05-15

   This document defines the Opus interactive speech and audio codec.
   Opus is designed to handle a wide range of interactive audio
   applications, including Voice over IP, videoconferencing, in-game
   chat, and even live, distributed music performances.  It scales from
   low bitrate narrowband speech at 6 kb/s to very high quality stereo
   music at 510 kb/s.  Opus uses both linear prediction (LP) and the
   Modified Discrete Cosine Transform (MDCT) to achieve good compression
   of both speech and music.


A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-codec-opus-13.txt

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

This Internet-Draft can be retrieved at:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-codec-opus-13.txt

The IETF datatracker page for this Internet-Draft is:
https://datatracker.ietf.org/doc/draft-ietf-codec-opus/


From jmvalin@jmvalin.ca  Tue May 15 17:33:30 2012
Return-Path: <jmvalin@jmvalin.ca>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7D18321F85F2; Tue, 15 May 2012 17:33:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level: 
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EQNhI0S0FQaq; Tue, 15 May 2012 17:33:26 -0700 (PDT)
Received: from relais.videotron.ca (relais.videotron.ca [24.201.245.36]) by ietfa.amsl.com (Postfix) with ESMTP id 07FB921F85F1; Tue, 15 May 2012 17:33:26 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/mixed; boundary="Boundary_(ID_+C3bRGW2uyqqCN0LMd/yZQ)"
Received: from [192.168.1.14] ([96.21.20.94]) by VL-VM-MR006.ip.videotron.ca (Oracle Communications Messaging Exchange Server 7u4-22.01 64bit (built Apr 21 2011)) with ESMTP id <0M4300KGTAVO2L70@VL-VM-MR006.ip.videotron.ca>; Tue, 15 May 2012 20:33:25 -0400 (EDT)
Message-id: <4FB2F5D0.7070701@jmvalin.ca>
Date: Tue, 15 May 2012 20:33:20 -0400
From: Jean-Marc Valin <jmvalin@jmvalin.ca>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
To: Elwyn Davies <elwynd@folly.org.uk>
References: <1337001184.23527.1544.camel@mightyatom.folly.org.uk>
In-reply-to: <1337001184.23527.1544.camel@mightyatom.folly.org.uk>
X-Enigmail-Version: 1.4.1
Cc: General Area Review Team <gen-art@ietf.org>, "codec@ietf.org" <codec@ietf.org>, IETF discussion <ietf@ietf.org>, draft-ietf-codec-opus.all@tools.ietf.org
Subject: Re: [codec] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 May 2012 00:33:30 -0000

This is a multi-part message in MIME format.

--Boundary_(ID_+C3bRGW2uyqqCN0LMd/yZQ)
Content-type: text/plain; CHARSET=US-ASCII
Content-transfer-encoding: 7BIT

Hi Elwyn,

Thanks for the very thorough review. We've addressed your issues and
submitted draft version -13. See our response to each of the issues you
raised (aggregated from all the authors) in the attached document.

Cheers,

	Jean-Marc

On 12-05-14 09:13 AM, Elwyn Davies wrote:
> I am the assigned Gen-ART reviewer for this draft. For background on 
> Gen-ART, please see the FAQ at 
> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. 
> 
> Please resolve these comments along with any other Last Call comments 
> you may receive. 
> 
> Document: draft-ietf-codec-opus-12.txt
> Reviewer:  Elwyn Davies
> Review Date:  14 May 2012 (completed)
> IETF LC End Date: 10 May 2012
> IESG Telechat date: (if known) -
> 

--Boundary_(ID_+C3bRGW2uyqqCN0LMd/yZQ)
Content-type: text/plain; CHARSET=US-ASCII; name=Gen-art.txt
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=Gen-art.txt

Elwyn Davies wrote:
> Major issues: 
> Can we accept the code as normative?  If not how do we proceed?

The issue with code being normative was specifically addressed in 
the guidelines document for this WG (RFC 6569).

> Minor issues:
> Contrast between decoder descriptions of LP part and CELT part:  The
> SILK descriptions go into gory detail on the values used in lots of
> tables, etc., whereas the CELT part has a very limited treatment of the
> numeric values used (assuming reliance on finding the values in the
> reference implementation, either explictly or implicitly).  There are
> things to be said for both techniques.  I was wondering (while reading
> the SILK description) if the authors have any means of automatically
> generating the tables from the code in the SILK part (or vice versa) to
> avoid double maintenance. On the other hand, there are pieces of the
> CELT decoder description (especially in s4.3.3 where knowing numbers of
> bands, etc.) where some actual numbers would help comprehension.
> 

We have made many changes to section 4.3 (and 4.3.3 specifically) to address
the specific issues below. As for the tables, they are not generated
automatically.


> s2 (and more generally):  The splitting of the signal in the frequency
> domain into signal (components?) below and above 8kHz presumably
> requires that the signal is subjected to a Discrete Fourier Transform to
> allow the signal to be split.  I think sometging is needed in s2 to
> explain how this is managed (or if I don't understand, to explain why it
> isn't necessary).

No DFT is used. The lower band is obtained through resampling (which is already described) and the higher band is obtained by not coding the lower band with CELT (the text says that CELT starts at band 17 in hybrid mode). The explanation was reworded to make this as clear as possible at this point in the text.

> s4.1: Despite having found a copy of the range coding paper and tried to
> read it, I found the whole business of range coding opaque.  Given that
> this is (I think) fairly novel, some additional less opaque description
> could be very helpful to people trying to understand what is going on.
> In particular knowledge of how entropy coding works is pretty much a
> prerequisite.

Added a link to the Wikipedia article on range coding.

> s4.2.5, para 3:
>>     When switching from 20 ms to 10 ms, the 10 ms
>>     20 ms Opus frame, potentially leaving a hole that needs to be
>>     concealed from even a single packet loss.
> How?

As explained in the LBRR text, a 10 ms frame will only contain 10 ms LBRR data even if the previous frame was 20 ms, so there's 10 ms "missing".

> s4.2.5, para 4:
>> In order to properly produce LBRR frames under all conditions, an
>>     encoder might need to buffer up to 60 ms of audio and re-encode it
>>     during these transitions.  However, the reference implementation opts
>>     to disable LBRR frames at the transition point for simplicity.
>>
> Should this be phrased in RFC 2119 requiremenmts language?  The first
> part sounds like a SHOULD with the second part being the get out , but
> its not entirely clear what the consequences are other then simplicity.

There really is no SHOULD there. We're just describing possible strategies for encoding. In general, we have avoided 2119 language as much as possible for the encoder. However, the sentence, "Since transitions are relatively infrequent in normal usage, this does not have a significant impact on packet loss robustness," was added to alleviate any concerns about the consequences of this decision.

> ======================================================================
>              MINOR ISSUES ABOVE submitted as part 1 of review
> ======================================================================
> general/s11.2:  Several references [Hadamard], [Viterbi}, etc., are to
> Wikipaedia pages.  Whilst these are convenient (and only illustrative)
> they are not guaranteed to be very stable.  Better (i.e., more stable)
> references are desirable.

We believe that the links themselves should be quite stable. We could 
also link specific revisions in Wikipedia, but the quality of the content
is more likely to get better than worse. Considering
that these references are informational, we believe that Wikipedia is a
good source.

> s1, para 2:
>>     The decoder contains a great deal of integer and fixed-point
>>     arithmetic which must be performed exactly, including all rounding
>>     considerations,...
> Is this a MUST?  There are instances in the text which might contradict
> this (e.g., para 1 of s4.2.7.5 which has (capitalized) SHOULDs).

This sentence did not mean to imply that _all_ such arithmetic must be performed exactly, only that there exists a good deal of arithmetic for which this is true (for example, to achieve the final range check MUST from Section 6). Since it isn't clear at this point exactly which arithmetic is being referred to, we can't use RFC 2119 strength here. Replaced "must be" with "needs to be" to avoid the impression of an RFC 2119 requirement.

> s4.3:
> As a complete newcomer to CELT, I would have appreciated a more high
> level understanding of what CELT is doing at this point.  I  tried
> reading s4.3 without any additional input and found it very hard going.
> Eventually I gave up and went looking for some additional input.  This
> presentation seems to have a useful view 
> http://www.ietf.org/proceedings/79/slides/codec-2.pdf
> I think that it would be extremely helpful to have a description similar
> to this at this point in the document, even though there is some
> material in section 5.3 which could also be forward referenced.  Still
> the material in s5.3 does not start from the basic principles that CELT
> is using, and since these are essentially novel, it would be very good
> to give prospective implementers/users an understanding of what is going
> on.  Incidentally, I found the above IETF presentation more useful than
> http://www.celt-codec.org/presentations/misc/lca-celt.pdf
> Note that the SILK part seems rather less opaque.  It would also be
> useful to indicate numerically how many bands are involved and what the
> number of MDCT bins are in the various bands. 

The intro of section 4.3 has been expanded with general information about CELT
similar to what the codec-2.pdf slides from 79 included. 

> s4.3, last para: Can the 'out of range error' occur in the LP decoder? (if not why not?)

Added the following text:
"Such out of range errors cannot occur in the SILK layer."


> ======================================================================
> 
> Nits/editorial comments:
> 
> global: bytes ->  octets

We believe that in the field of audio codecs, the mention of "byte" without
further context is well understood to mean 8 bits.

> global: The form/terminology Q<n>  (e.g., Q13, Q15, Q16) ought to be
> explained.

This was already defined in the section on notation:
   The notation "Q<n>", where n
   is an integer, denotes the number of binary digits to the right of
   the decimal point in a fixed-point number. 

> s1: Expand CELP

Done.
 
> s1.1.6: Need to define floor().

Done. Also defined abs(), ceil(), and round().

> s2: ? Expand SILK

SILK is not an acronym.

> s2: Reference for Vorbis.

Linked to http://xiph.org/vorbis/

> s2.1.8: Expand AAC (and MP3).

Expanded
MP3: MPEG 1, Layer 3
AAC: Advanced Audio Coding

> s3: References for Ogg, Matroska, maybe RTP.

Done.

> s3: Fig 1, Table 2 and intervening text:  Presumably SILK only (Table 2)
> etc correspond to MODES 1-3 in Figure 1. This needs to be consistent.

Changed LP to SILK and MDCT to CELT.

> s3.2.1: Make it clear which of the octets is len[0]/len[1].  To be
> precise it might be better to say len0/len1 are the values of the two
> length octets (in whichever order you intend). The form len[0] could be
> misinterpreted as a function 'length of 0'.

Replaced len[0] and len[1] with first_byte and second_byte, respectively.

> s3.2.5: better s/figure below/Figure 5/

Fixed for all cases of "figure below".

> s3.2.5:
>> In the CBR case, the compressed length of each frame in bytes is
>>     equal to the number of remaining bytes in the packet after
>>     subtracting the (optional) padding, (N-2-P), divided by M. This
>>     number MUST be a non-negative integer multiple of M.
> 'This number' is not the  compressed length of each frame that is the
> subject of the first sentence, but the number of remaining octets - this
> needs rewording.

Fixed by defining R=N-2-P, e.g.,
   In the CBR case, let R=N-2-P be the number of bytes remaining in the
   packet after subtracting the (optional) padding.  Then the compressed
   length of each frame in bytes is equal to R/M. The value R MUST be a
   non-negative integer multiple of M. The compressed data for all M
   frames follows, each of size R/M bytes, as illustrated in Figure 6.
 
> s3.2.5:
>> The number of header bytes (TOC byte, frame
>>     count byte, padding length bytes, and frame length bytes), plus the
>>     length of the first M-1 frames themselves, plus the length of the
>>     padding MUST be no larger than N, the total size of the packet.
> Surely this is a non sequitur? This might be better phrased as 'The
> total size of a well formed packet MUST be at least...'

Note that the text mentions the "length of the first M-1 frames", so the calculation does not include the size of the last frame. That being said, we also clarified by using "signaled length" instead of "length".

> s3.3: The example diagrams ought to have figure numbers.

Fixed.

> s3.4: I am not keen on duplicating normative requirements in this way
> (double maintenance issue).  It would be better to put explicit numbered
> requirements in the sections above an reference the resulting numbers
> here.

A checklist style summary is quite useful for implementors, and worth the maintenance burden for a document that is (hopefully) going to be published once and read many times. The list intentionally does not include any RFC 2119 keywords, to avoid any conflict should there (accidentally) be room to interpret the re-statement any differently from the original statement. Numbering the requirements and referencing the numbers is still a good idea, but it should be possible to read the list without flipping back and forth to the previous sections.

> s4.1:
>> The decoder initializes rng to 128 and initializes val to
>>     127 minus the top 7 bits of the first input octet.
> How are the 'top seven bits' to be interpreted here? e.g. as the bottom
> seven bits of a 8 bit integer field? an 8 bit integer with the lowest
> bit zeroed out?

Tried for something more explicit:
   Let b0 be the first input octet (or zero if there are no octets in
   this Opus frame).  The decoder initializes rng to 128 and initializes
   val to (127 - (b0>>1)), where (b0>>1) is the top 7 bits of the first
   input octet.  It saves the remaining bit, (b0&1), for use in the
   renormalization procedure described in ...

> s4.1.1:  This is really a global point.  This section refers to
> entdec.c.  Presumably (since we haven't reached the code yet) and it is
> still compressed, there is some file structure.  I don't think this has
> been said above.  It would be good to provide a list of the file
> components (i.e., sectional structure of the code) at the start, maybe
> even  giving line number positions within the decompressed code.

In cases where it was deemed necessary (e.g. large functions), there are
indeed line numbers in references to the code. As for a list of files
we did not think it was useful because one already needs to decompress
the code to see the references.

> s4.1.1.1:
>> Then it reads the next octet of the
>>     payload and combines it with the left-over bit buffered from the
>>     previous octet to form the 8-bit value sym.  It takes the left-over
>>     bit as the high bit (bit 7) of sym, and the top 7 bits of the octet
>>     it just read as the other 7 bits of sym.
> This is not well phrased.  Better
>       Then it reads the next octet of the payload [packet? payload hasn't
> really been used before] and combines the left  over bit from the

"Packet" is inaccurate, as the next byte in the packet may come from a different Opus frame. Replaced with "Opus frame" instead.

> previous octet (see Section 4.1 for starting this process) as the high
> bit (bit 7)| of 'sym' and the top 7 bits of the octet as the other 7
> bits of sym, leaving the remaining bit for the next iteration.

This may be trying to do too much in a single sentence. Perhaps:
  Then it reads the next octet of the Opus frame and forms an 8-bit
  value sym, using the left-over bit buffered from the previous octet
  as the high bit and the top 7 bits of the octet just read as the
  other 7 bits of sym.  The remaining bit in the octet just read is
  buffered for use in the next iteration.  If no more input octets
  remain, it uses zero bits instead.  See
  <xref target="range-decoder-init"/> for the initialization used to
  process the first octet.

Also broke out range-decoder-init into its own section to make it easier to reference.

> s4.1.5.2:
> Should r_Q15 = rng>>  (l-16) be r_Q15 = rng>>  (lg-16)?  There doesn't
> seem to be an 'l' defined.

Fixed. Sorry; this is a leftover from changing "l" to "lg" in response to one of the AD's review comments.

> s4.2.1: Expand LTP earlier. It would also be useful to expand LPC again.

Done (both).

> s4.2.2: acronym VAD is not expanded until the beginning of s4.2.3.

Fixed.

> s4.2.7: acronym LSF needs to be expanded on first use.

Fixed.

> s4.2.7.1: Explain briefly why Table 7 has values for indices 0 to 15
> when wi0/1 are in range 0 to 14.

Added:
   Although wi0 and wi1 only have 15 possible values, Table 7 contains
   16 entries to allow interpolation between entry wi0 and (wi0 + 1)
   (and likewise for wi1).

> s4.2.7.4, para below Table 12:
>> These 6 bits are combined to form a gain index between 0 and 63.
> s/gain index/gain_index/ as this variable is used subsequently.

Used:
  These 6 bits are combined to form a value, gain_index, between 0 and
  63.
Also updated the "delta gain index" for subframes without an independent gain, below.

> s4.2.7.4: The use of log_gain seems slightly confusing when combined
> with gain_index.  One at least is presumably log scaled.  Maybe a bit
> more explanation is needed.

The gain_index happens to map directly to log_gain for subframes with an independent gain (with the exception of the clamping), but this is not true for subframes where the gain is coded relative to the previous subframe. We are open to suggestions that would make the need for this distinction clearer.

Also replaced "previous gain index" with "previous_log_gain", as it is the latter that actually matters.

> ======================================================================
>              COMMENTS ABOVE submitted as part 1 of review
> ======================================================================
> s4.2.7.2, last para:
>> In that case, if this
>>     flag is zero (indicating that there should be a side channel), then
>>     Packet Loss Concealment (PLC, see Section 4.4) SHOULD be invoked to
>>     recover a side channel signal.
> What are the consequences (or what actions need to be taken) if it is
> not invoked?

Added:
  Otherwise, the stereo image will collapse.

> s4.2.7.5, para 1:
>> These represent the interleaved zeros on the
>>     unit circle between 0 and pi (hence "normalized") in the standard
>>     decomposition of the LPC filter into a symmetric part and an anti-
>>     symmetric part (P and Q in Section 4.2.7.5.6).
> 'on the unit circle between 0 and pi' might be clearer as 'on the upper
> half of the unit circle' or 'on the half of the unit circle in the
> positive imaginary area of the complex plane'.
> 'standard decomposition'?  Needs a reference.

Reworded as:
   These represent the interleaved zeros on the upper half of the unit
   circle (between 0 and pi, hence "normalized") in the standard
   decomposition [line-spectral-pairs] of the LPC filter into a
   symmetric part and an anti-symmetric part (P and Q in Section
   4.2.7.5.6).

Added the reference [line-spectral-pairs] pointing to http://en.wikipedia.org/wiki/Line_spectral_pairs

> s4.2.7.5, para 1: A reference for the use of LSF in LPC would be useful.

The above reference serves this function as well.

> s4.2.7.5.x: There is inconsistent use of stage 1/stage 2 vs
> stage-1/stage-2.  Please be consusistent

"Stage 1" is correct when referring to the noun by itself, or as an object of a verb ("stage 1 decoding"). "Stage-1" is correct when modifying another noun ("stage-1 index"). However, we changed the titles of some of the tables around to allow them to use the modifier form more consistently.

> s4.2.7.5, para 1:
>>     Because of non-linear
>>     effects in the decoding process, an implementation SHOULD match the
>>     fixed-point arithmetic described in this section exactly.  An encoder
>>     SHOULD also use the same process.
> - Does this contradict the 'must' in s1, para 2?

No, for the reasons described above.

> - What are the consequences of ignoring the SHOULD?  How bad would they
> get?  Might it become unstable and how would one know?

They would not become unstable. However, changing the exact arithmetic may
lead to a decoder output that is not close enough to that of the 
reference decoder to pass the conformance test.

> s4.2.7.5.1, para 1: s/This indexes an element in a coarse codebook,
>     selects the PDFs for the second stage of the VQ/This indexes an
>     element in a coarse codebook that selects the PDFs for the second stage
>     of the VQ/

The text as written is correct. The index I1 is what selects the PDFs for the second stage, not the vector from the coarse codebook in Tables 23 and 24. I.e., it's saying, "This does A, B, and C."

> s4.2.7.5.3, last para: 
>> However, nothing in
>>    either the reconstruction process or the quantization process in the
>>    encoder thus far guarantees that the coefficients are monotonically
>>    increasing and separated well enough to ensure a stable filter.
> A reference that indicates why this requirement is needed would be desirable.
> (and also for s4.2.5.7.8).

Added a reference to:
P. Kabal and R. P. Ramachandran, "The Computation of Line Spectral Frequencies 
Using Chebyshev Polynomials", IEEE Trans. Acoustics, Speech, Signal Processing,
vol. 34, no. 6, pp. 1419-1426, Dec. 1986.

> s4.2.7.5.4 and Table 25: Are the values in Table 25 NDeltaMin or NDelatMin_Q15?
> The equations after Table 25 use both NDeltaMin and NDeltaMin_Q15.  Is this correct?
> In particular the first two equations deliver _Q15 values but use raw NDeltaMin.

Good catch. All uses of NDeltaMin should have been NDeltaMin_Q15.

> s4.1.1/s4.2.7.1 and other places:  The term 'exact integer division' is
> used in various places.  My understanding was that this phrase implied
> that it was known that the dividend was an exact multiple of the divisor
> by some out-of-band means.  This doesn't seem to be the case generally
> in Opus (e.g,, where both n/5 and n%5  are needed - clearly this doesn't
> anticipate n%5 == 0 every time!)  So what does 'exact integer division'
> imply?  A definition may be needed. 

The "exact" here was merely 
meant in contrast to inexact floating-point division. Removed "exact", leaving only 
"integer division" (which, like all other operators, uses C conventions, as stated in 
Notation and Conventions). Hopefully this should avoid people trying to read too much into 
the phrase.

> s4.3, last para: s/described in the figure above./described in Table 55 above./

Fixed.

> s4.3.1: 
>> The "transient" flag encoded in the bitstream has a probability of 1/8. 
> This statement appears out of the blue apparently.  Some more
> explanation of what the transient flag actually implies and why we
> should be so sure about its PDF would help.

Added a bit of context and moved the probability 1/8 to later in the text.

> s4.3.2.1: Arguably a reference is needed for the z-transform.

Reference added.

> s4.3.2.1: Avoid the equation picture splitting across page boundaries.
> in the current version it is unclear what the denominator is. (use
> needLines processing direcrive in xml2rfc).  Same applies to the
> equation below Table 57 in s4.3.4.3.

It's not quite clear how to use needLines without undesirable side-effects.
Hopefully this is something the RFC editor should be able to handle.

> 4.3.2.1, after the equations:
>> The
>>    prediction is clamped internally so that fixed point implementations
>>    with limited dynamic range do not suffer desynchronization.
> As a person with limited skills in the srt, I have no idea what
> desynchronization implies here. 

Clarified to "always remain in the same state as floating point implementations".

> 4.3.2.1, ibid:
>> We
>>    approximate the ideal probability distribution of the prediction
>>    error using a Laplace distribution with separate parameters for each
>>    frame size in intra- and inter-frame modes.
> I suspect this sentence belongs before the equation described the z-transform.
> Where are the values of the parameters for the inter-frame mode defined
> (the intra-frame ones are in the text)?

The probability model is for the difference between the energy and its
prediction. Added a reference to the table where the pdf parameters are
located in the code.

> s4.3.2.3: Paragraph on decoding band boosts:  Might be improved by using
> equations rather than the wordy descriptions used at present.
> 
> (global)s4.3.2.3, para above table 56: s/iff/if and only if/

Fixed.

> s4.3.2.3: LOG2_FRAC_TABLE is missing.

Text now says that the table is in rate.c

> s4.3.3: It would be helpful to explain either here, or at the outset of
> s4.3 overall, how the concept of energy bands and MDCT bins applies to
> the CELT part of the codec, and just how many bands and bins are used.
> Some of this is contained in s5.3.2, but the magic number 17 appears
> later in 4.3.3 which is presumably something to do with the point in the
> frequency domain that CELT takes over from LP in the hybrid mode.  It
> would make the very complex section 4.3.3 rather easier to understand
> with this extra information - I have to say I struggled!  On reflection,
> I think an example of what bits are allocated to a band and how thay rae
> subsequently used would be quite helpful - Without going to delve into
> the code I am really not clear that I understand just what bits are
> allocated and what they then encode and I have read the text quite a few
> times now.

Added an explanation of band 17 at the beginning of 4.3. There is also
some text about the number of bins per band.

> s4.3.3: Be consistent between 'tone to noise' and 'tone-to-noise'.

The text was rephrased in terms of signal-to-noise.

> s4.3.3:
>>    The band-energy normalized structure of Opus MDCT mode ensures that a
>>    constant bit allocation for the shape content of a band will result
>>    in a roughly constant tone to noise ratio, which provides for fairly
>>    consistent perceptual performance.  The effectiveness of this
>>    approach is the result of two factors: that the band energy, which is
>>    understood to be perceptually important on its own, is always
>>    preserved regardless of the shape precision, and because the constant
>>    tone-to-noise ratio implies a constant intra-band noise to masking
>>    ratio.  Intra-band masking is the strongest of the perceptual masking
>>    effects.  This structure means that the ideal allocation is more
>>    consistent from frame to frame than it is for other codecs without an
>>    equivalent structure.
> This paragraph contains a number of interesting assertions:  Is there a
> reference where one could see them justified (it may be that this is the
> result of original research in the Opus team).

The particular importance of energy preservation was substantively a discovery
of the Vorbis team during the later tuning of that codec. Vorbis encoders 
include a separate step called 'noise normalization' which attempts to restore
lost energy. In CELT this energy preservation was made implicit in the design of the
codec. 

The text here is intended to explain the motivation behind the particular
band-energy normalized structure of this part of the codec in the hopes
that a higher-level understanding would help make the component steps make
a little more sense. This is "original research", but there is now a link
to an older paper describing this research:
   [Valin2010]
              Valin, JM., Terriberry, T., Montgomery, C., and G.
              Maxwell, "A High-Quality Speech and Audio Codec With Less
              Than 10 ms delay", IEEE Trans. on Audio, Speech and
              Language Processing, Vol. 18, No. 1, pp. 58-67 2010.


> s4.3.3, paragraphs after the bullet points:  The concepts of 'shape' and
> 'shape encoding' is introcuced here without explicit definition.  Are we
> talking about the shape windowing used in FFT/MDCT here? This should be
> made clear.

The concept of shape is now discussed at the beginning of s4.3.

> s4.3.3, 5th para after bullet points: s/In the reference the maximums/In
> the reference implementation the maximums/

Fixed.

> s4.3.3, 6th para after the bullet points:  A table of bands per mode and
> number of MDCT bins  covered would be helpful here in order to  get a
> feeling for the scale of the problem.  Also the cache_caps50 table in
> the code contains the magic number 168.  Where does this come from?

The layout of the band is now included at the beginning of 4.3. Also, section
4.3.3 now includes an explanation for the size of cache_caps50 
(21 bands, 4 values of LM, mono+stereo). 

> s4.3.3, 6th para after the bullet points:
>> set LM to the shift value for the frame size (e.g. 0 for 120, 1 for
>>    240, 3 for 480),  
> Where do these frame sizes get specified? And what is the total set of
> frame sizes? The text says 'e.g.' (which incidentally should be 'e.g.,')
> implying that this is not the complete set.

The calculation of LM is now given earlier in section 4.3.3


> s4.3.3, 6th para after the bullet points: Need to define 'truncating
> integer division' to go with 'exact integer division'.

Replaced with "integer division"

> s4.3.3, 7th para after the bullet points: 
>> The band boosts are represented by a series of binary symbols which
>>    are coded with very low probability.
> How many, at least, and what values?  Are these range encoded? I don't
> see them in the table above or with a PDF specified.

The text has been clarified to say that this is entropy coded. Also, the
probabilities that correspond to 6 bit and 2 bits (1/64 and 1/4, respectively)
are given to emphasize the fact that the number of bits is directly related to
the probabilities.

> s4.3.3, 7th para after the bullet points:
>>    and every time
>>    a band is boosted the initial cost is reduced (down to a minimum of
>>    two).
> Would that be a value of two or two bits? 

Two bits (fixed in the draft).

> s4.3.3: Paragraph on decoding band boosts:  Might be improved by using
> equations rather than the wordy descriptions used at present.
> 
> (global)s4.3.3, para above table 56: s/iff/if and only if/

Fixed.

> s4.3.3, 2nd para after Table 56: 
>> For stereo frames, bits are reserved for intensity stereo and for
>>    dual stereo.  Intensity stereo requires ilog2(end-start) bits.
> The terms 'intenmsity stereo' and 'dual stereo' don't appear to have
> been defined.

Added a definition of intensity, dual, and mid-side stereo to the beginning
of section 4.3. 

> s4.3.3: LOG2_FRAC_TABLE is missing.

Text now says that the table is in rate.c

> 4.3.4.3, last para:
>>    If the decoded vector represents more than one time block, then the
>>    following process is applied separately on each time block. 
> Should this sentence come before the previous paragraph?  There  isn't
> really a 'following process' in this section and I don't think it menas
> the process in s4.3.4.4?


Indeed. Replaced "the following process" with "this spreading process".

> s4.3.4.3, last sentence: 
>> This extra rotation is applied in an interleaved manner with a stride
>>    equal to round(sqrt(N/nb_blocks))
> I think this needs some more explanation for the uninitiated.

Although "stride" is a relatively common term, we added this to the sentence:
"i.e. it is applied independently for each set of sample S_k = {stride*n + k},
n=0..N/stride-1."

> s4.3.4.4:
>> Multiple levels of splitting may be
>>    applied up to a frame size dependent limit. 
> What this limit is does not appear to be defined.

Added the limit of LM+1 splits.

> s4.3.5: The 'collapse' phenomenon is not fully defined, and it would be
> useful to mention why it happens.  Also s/min/minimum/.

Added a short description of the anti-collapse feature and fixed the
s/min/minimum/

> s4.3.6: s/Just like/Just as/

Fixed.

> s4.3.7, last para: I think 'power complementarity' requires further
> explanation or a reference.

Added reference to:
John P. Princen and Alan B. Bradley, "Analysis/synthesis filter bank design based on time domain aliasing cancellation," IEEE Trans. Acoust. Speech Sig. Proc. ASSP-34 (5), 1153-1161 (1986)

> s4.5, para 3: s/To avoid or reduces glitches during these/To avoid or
> reduce glitches during these/

Fixed.

> s4.5.1.1, para 1: s/For for SILK-only/For SILK-only/

Fixed.

> s4.5.1.4, para 2: s/redundant frame is as-is,/redundant frame as-is,/

Fixed.

> s5, figure 16: The Optional High-pass Filter box has two spurious '+'
> symbols on the vertical sides. 

Fixed.

> s5, last para:  A reference for the Auto Regressive Moving Average
> (ARMA) filter would be useful.

This is a simple issue and a reference to the entire theory would have made 
the issue more complicated. The parenthesis now states "i.e. with poles and 
zeros" to clarify this.

> s5.2.3.4.2.1, title: s/Burgs method/Burg's Method/

Fixed.

> s5.2.3.5, para 2: Expand 'R/D performance' (or probably specify it as
> abbreviation for rate-distortion in para 1).

Done.

> s5.3.5, para 1 below equation: Is E an abbreviation for 'extra'?

Oops. s/extra/E/ in the text below.

> s5.3.6: The abbreviation RD for rate-distortion is defined here (see
> comment on s5.2.3.5).

Fixed.

> s6.1: This section is perhaps a little 'triumphalist' for the reference
> implementation (this may of course be justified!.  The quality metric is
> a 'relative quality metric' and presumably if someone does a *really*
> good job of coding, it is entirely possible that the new algorithms
> might get better than 100 on the quality score (i.e., provide a better
> answer than the reference implementation).

Conformance with the specification is defined by a faithful reproduction of
the specified decoder behavior (see RFC 6569 s4.3). By specifying the
decoder, future encoders have the freedom to have improved quality with the
confidence that they know what output a conforming decoder will produce.

The tests in s6.1 are measuring the quality of the decoded signal against the 
reference decoded signal, not against the original audio, so greater than 100%
wouldn't be possible or meaningful. The test signals have been specially
prepared to thoroughly test a decoder implementation, and they sacrifice encoded
quality in order to rapidly exercise the corner cases.

> s6.2: Just wondering, but is non-standard frame size the only option
> offered by Opus Custom?  If not, probably more text is needed here.  Are
> there any major changes to the algorithms implied by the use of Opus
> Custom?

Opus-custom permits more flexibility for frame-sizes at a cost of limiting
the mode switching, requiring more dynamic memory, etc. The algorithm itself
remains exactly the same, though some things like the band count/layout change
to accommodate the different frame sizes while still approximating the same Bark
structure.

> s7.  Referencing SECGUIDE (RFC 3552) seems inappropriate since it occurs
> in such a security considerations section. Just omit it.

Done.

> s11.2: A number of references are to Wikipaedia pages.  While these were
> useful to me in refreshing or initializing my knowledge, they are not
> usually considered adequately stable for use in RFCs.  I fear you may
> have to provide more stable references.

In the cases where we use Wikipedia citations, they are intended as informative 
general references for introductory subject matter, and we are not relying on any
specific details or wording that would be affected by further revisions. While 
the alternatives are often dense academic papers that may not be well suited
to  a general codec implementing audience.


--Boundary_(ID_+C3bRGW2uyqqCN0LMd/yZQ)--

From fluffy@cisco.com  Thu May 17 09:07:48 2012
Return-Path: <fluffy@cisco.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 333A821F857D; Thu, 17 May 2012 09:07:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.046
X-Spam-Level: 
X-Spam-Status: No, score=-110.046 tagged_above=-999 required=5 tests=[AWL=-0.047, BAYES_00=-2.599, J_CHICKENPOX_13=0.6, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TK+FQYnzJvDI; Thu, 17 May 2012 09:07:44 -0700 (PDT)
Received: from mtv-iport-2.cisco.com (mtv-iport-2.cisco.com [173.36.130.13]) by ietfa.amsl.com (Postfix) with ESMTP id E483F21F856F; Thu, 17 May 2012 09:07:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=fluffy@cisco.com; l=15303; q=dns/txt; s=iport; t=1337270864; x=1338480464; h=subject:mime-version:from:in-reply-to:date:cc: content-transfer-encoding:message-id:references:to; bh=+7VXU0/Uqjw0HmjEr8WbvfXDskmVmkLnY7AOJGlYICQ=; b=Oxjq+HzUXKRghpaaHKf6pMU2GMqGcCDzeBjICW6Cftmmv94oOQz6vra2 KrM9i77fw0hGHk8hs7aaDdsXMSR1es9GyBJbho5IznU4hUrWJpFQWLmRH lHeCmWejIn5BqqkZFRK5BvmTWClNjhuCkIAEiC6V7J+rODeHwNqdxSU9t E=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AjcFANohtU+rRDoJ/2dsb2JhbABEgx2wIIEHghUBAQECAQESARQTOgUFCwsYLlcGHAsHB4dnBAybZ6AGixMTAYRhYgOIY40XgRGEZIhigWmDCIE3AQ
X-IronPort-AV: E=Sophos;i="4.75,610,1330905600"; d="scan'208";a="45193506"
Received: from mtv-core-4.cisco.com ([171.68.58.9]) by mtv-iport-2.cisco.com with ESMTP; 17 May 2012 16:07:37 +0000
Received: from [192.168.4.100] (sjc-fluffy-8914.cisco.com [10.20.249.165]) by mtv-core-4.cisco.com (8.14.3/8.14.3) with ESMTP id q4HG7a6o025259; Thu, 17 May 2012 16:07:36 GMT
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Cullen Jennings <fluffy@cisco.com>
In-Reply-To: <1337203600.31554.2213.camel@mightyatom.folly.org.uk>
Date: Thu, 17 May 2012 10:07:36 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <6D6B36EF-C4B4-4C65-AA07-6C6A59725291@cisco.com>
References: <1337001184.23527.1544.camel@mightyatom.folly.org.uk> <4FB2F5D0.7070701@jmvalin.ca> <1337203600.31554.2213.camel@mightyatom.folly.org.uk>
To: Elwyn Davies <elwynd@folly.org.uk>
X-Mailer: Apple Mail (2.1084)
Cc: codec@ietf.org, IETF discussion <ietf@ietf.org>, draft-ietf-codec-opus.all@tools.ietf.org
Subject: Re: [codec] [Gen-art] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 May 2012 16:07:48 -0000

I think the authors just about have a -14 draft ready but I wanted to =
comment on one topic inline ....


On May 16, 2012, at 3:26 PM, Elwyn Davies wrote:

> Hi, Jean-Marc.
>=20
> ... and thanks for the super-quick response!  You have been quite =
busy.
>=20
> I have had a look through the new draft and I think the additions help
> considerably with comprehension for the naive (and to give new
> implementers a way in.)
>=20
> I'll leave you to negotiate with the RFC Editor over the Wikipaedia
> references.  To quote the RFC Style guide
> http://www.rfc-editor.org/rfc-style-guide/rfc-style
> Section 4.8, item (x) References, last section:
>> URLs and DNS names in RFCs
>>=20
>>      The use of URLs in RFCs is discouraged, because many URLs are =
not
>>      stable references.  Exceptions may be made for normative
>>      references in those cases where the URL is demonstrably the most
>>      stable reference available.  References to long-lived files on
>>      ietf.org and rfc-editor.org are generally acceptable.
> They are certainly convenient *as long as they remain in place and
> aren't corrupted*.
>=20
> I found a couple of trivial editorial nits in the changes:
> s4.3.3 (in the added text):
>> The CELT layer, however, can adapt over a very wide range of rates,
>> and thus has a large number of codebooks sizes
> s/codebooks/codebook/
>=20
> s4.3.3, para after Table 57: s?the maximums in bit/sample are
> precomputed?the maximums in bits/sample are precomputed?
>=20
> Also suggest:
> s4.3: Add reference for Bark scale: Zwicker, E. (1961), "Subdivision =
of
> the audible frequency range into critical bands," The Journal of the
> Acoustical Society of America, 33, Feb., 1961.
>=20
> A few responses in line below (agreed pieces elided):=20
>=20
> Regards,
> Elwyn  Davies
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> On Tue, 2012-05-15 at 20:33 -0400, Jean-Marc Valin wrote:
>> Hi Elwyn,
>>=20
>> Thanks for the very thorough review. We've addressed your issues and
>> submitted draft version -13. See our response to each of the issues =
you
>> raised (aggregated from all the authors) in the attached document.
>=20
> Thanks very much to all the authors.
>>=20
>> Cheers,
>>=20
>> 	Jean-Marc
>>=20
>=20
>=20
>> Elwyn Davies wrote:
>>> Major issues:=20
>>> Can we accept the code as normative?  If not how do we proceed?
>>=20
>> The issue with code being normative was specifically addressed in=20
>> the guidelines document for this WG (RFC 6569).
>>=20
> Yes.  I failed to read this although Robert Sparks did point out to me
> that the code was normative - but he didn't think he said this was
> agreed in advance (or maybe I didn't understand what he was saying).
>=20
> To be honest I would like to have had the time to tie the text to the
> code but realistically that is several weeks or even months work to do
> it properly - without that I feel that I have only done half a job. I
> may decide that it interests me enough to have a superficial go next
> week but no promises!
>=20
>>=20
>>> Minor issues:
>>> Contrast between decoder descriptions of LP part and CELT part:  The
>>> SILK descriptions go into gory detail on the values used in lots of
>>> tables, etc., whereas the CELT part has a very limited treatment of
>> the
>>> numeric values used (assuming reliance on finding the values in the
>>> reference implementation, either explictly or implicitly).  There
>> are
>>> things to be said for both techniques.  I was wondering (while
>> reading
>>> the SILK description) if the authors have any means of automatically
>>> generating the tables from the code in the SILK part (or vice versa)
>> to
>>> avoid double maintenance. On the other hand, there are pieces of the
>>> CELT decoder description (especially in s4.3.3 where knowing numbers
>> of
>>> bands, etc.) where some actual numbers would help comprehension.
>>>=20
>>=20
>> We have made many changes to section 4.3 (and 4.3.3 specifically) to
>> address
>> the specific issues below. As for the tables, they are not generated
>> automatically.
>=20
> I think this is addressed satisfactorily now.  There is still some
> difference but it is much reduced and not so glaring now. The addition
> of the band tables really helps.
>=20
>>=20
>>=20
>>> s2 (and more generally):  The splitting of the signal in the
>> frequency
>>> domain into signal (components?) below and above 8kHz presumably
>>> requires that the signal is subjected to a Discrete Fourier
>> Transform to
>>> allow the signal to be split.  I think sometging is needed in s2 to
>>> explain how this is managed (or if I don't understand, to explain
>> why it
>>> isn't necessary).
>>=20
>> No DFT is used. The lower band is obtained through resampling (which
>> is already described) and the higher band is obtained by not coding
>> the lower band with CELT (the text says that CELT starts at band 17 =
in
>> hybrid mode). The explanation was reworded to make this as clear as
>> possible at this point in the text.
>=20
> [I thought I had reworded this comment in the 2nd version to talk =
about
> MDCT but no matter].=20
> Yes, para 5 of s2 does say that the bands are discarded.  I think it
> would useful to have a concrete statement in the new text added to =
s4.3
> that bands 0 to 16 are discarded in hybrid mode (thereby making the 17
> in the band boost section more obvious) [There is a comment below that
> you have added some text about band 17 in section 4.3 but I can't see
> it].
>=20
>=20
>>=20
>>> s4.2.5, para 3:
>>>>    When switching from 20 ms to 10 ms, the 10 ms
>>>>    20 ms Opus frame, potentially leaving a hole that needs to be
>>>>    concealed from even a single packet loss.
>>> How?
>>=20
>> As explained in the LBRR text, a 10 ms frame will only contain 10 ms
>> LBRR data even if the previous frame was 20 ms, so there's 10 ms
>> "missing".
> Indeed - that there would be a hole was clear.  The 'How' referred to
> how would it be concealed.  Having read further by now this may be =
down
> to Packet Loss Concealment - so maybe all it needs is a foward ref to
> s4.4.=20
>>=20
>>> s4.3:
>>> As a complete newcomer to CELT, I would have appreciated a more high
>>> level understanding of what CELT is doing at this point.  I  tried
>>> reading s4.3 without any additional input and found it very hard
>> going.
>>> Eventually I gave up and went looking for some additional input.
>> This
>>> presentation seems to have a useful view=20
>>> http://www.ietf.org/proceedings/79/slides/codec-2.pdf
>>> I think that it would be extremely helpful to have a description
>> similar
>>> to this at this point in the document, even though there is some
>>> material in section 5.3 which could also be forward referenced.
>> Still
>>> the material in s5.3 does not start from the basic principles that
>> CELT
>>> is using, and since these are essentially novel, it would be very
>> good
>>> to give prospective implementers/users an understanding of what is
>> going
>>> on.  Incidentally, I found the above IETF presentation more useful
>> than
>>> http://www.celt-codec.org/presentations/misc/lca-celt.pdf
>>> Note that the SILK part seems rather less opaque.  It would also be
>>> useful to indicate numerically how many bands are involved and what
>> the
>>> number of MDCT bins are in the various bands.=20
>>=20
>> The intro of section 4.3 has been expanded with general information
>> about CELT
>> similar to what the codec-2.pdf slides from 79 included.=20
>>=20
> I think this is an excellent improvement.
>>=20
>>> Nits/editorial comments:
>>>=20
>>> global: bytes ->  octets
>>=20
>> We believe that in the field of audio codecs, the mention of "byte"
>> without
>> further context is well understood to mean 8 bits.
>=20
> True. But this is a matter of IETF style.  The style is to use octets
> where we mean 8 bit bytes. I think you now have a mixture!


This topic has generated more phone calls and discussion for me than all =
the rest of your review comments put together :-)=20

The use of the term "byte" does not seem to have caused any confusion or =
issues in either the spec writing or in the implementations and it is =
the term generally used by the community to refer to a group of 8 bits - =
so folks would like to stay with that. If the IESG wants to put  a =
discuss on this, preferable with a pointer to the relevant section of =
the discuss criteria document, we can deal with that then. If using =
octet is the IESG's desire, the document can easily be moved to use =
octet and my belief is that implementors, thought finding that sort of =
weird , will not have implementation or interoperability problems from =
that change.=20

Thanks, Cullen <CODEC WG Co-Chair>


>=20
>>=20
>>> global: The form/terminology Q<n>  (e.g., Q13, Q15, Q16) ought to be
>>> explained.
>>=20
>> This was already defined in the section on notation:
>>   The notation "Q<n>", where n
>>   is an integer, denotes the number of binary digits to the right of
>>   the decimal point in a fixed-point number.
> Sorry - I missed that.
>>=20
>>> s3.4: I am not keen on duplicating normative requirements in this
>> way
>>> (double maintenance issue).  It would be better to put explicit
>> numbered
>>> requirements in the sections above an reference the resulting
>> numbers
>>> here.
>>=20
>> A checklist style summary is quite useful for implementors, and worth
>> the maintenance burden for a document that is (hopefully) going to be
>> published once and read many times. The list intentionally does not
>> include any RFC 2119 keywords, to avoid any conflict should there
>> (accidentally) be room to interpret the re-statement any differently
>> from the original statement. Numbering the requirements and
>> referencing the numbers is still a good idea, but it should be
>> possible to read the list without flipping back and forth to the
>> previous sections.
>=20
> Good solution!
>=20
>>=20
>>> s4.1.1:  This is really a global point.  This section refers to
>>> entdec.c.  Presumably (since we haven't reached the code yet) and it
>> is
>>> still compressed, there is some file structure.  I don't think this
>> has
>>> been said above.  It would be good to provide a list of the file
>>> components (i.e., sectional structure of the code) at the start,
>> maybe
>>> even  giving line number positions within the decompressed code.
>>=20
>> In cases where it was deemed necessary (e.g. large functions), there
>> are
>> indeed line numbers in references to the code. As for a list of files
>> we did not think it was useful because one already needs to =
decompress
>> the code to see the references.
>=20
> OK. We'll have to live with this situation.
>=20
> Having looked at the code, I think it is a considerable pity that it
> isn't Doxygen commented (or some such) throughout so that the whole
> system can be viewed as a Doxygen tree. I can smell roasted programmer
> from here... :-)
>>=20
>>> s4.2.7.5.1, para 1: s/This indexes an element in a coarse codebook,
>>>    selects the PDFs for the second stage of the VQ/This indexes an
>>>    element in a coarse codebook that selects the PDFs for the
>> second stage
>>>    of the VQ/
>>=20
>> The text as written is correct. The index I1 is what selects the PDFs
>> for the second stage, not the vector from the coarse codebook in
>> Tables 23 and 24. I.e., it's saying, "This does A, B, and C."
>=20
> OK.  I think it might be clearer if the three things were separated =
out
> as a list.  Now you point it out I can read it correctly but it
> triggered minor confusion - worth turning the three things into bullet
> points.
>=20
> NEW:  s4.3: Add reference for Bark scale: Zwicker, E. (1961),
> "Subdivision of the audible frequency range into critical bands," The
> Journal of the Acoustical Society of America, 33, Feb., 1961.
>=20
> Generally the new intro to s4.3 helps *a lot*.
>>> s4.3.2.1: Avoid the equation picture splitting across page
>> boundaries.
>>> in the current version it is unclear what the denominator is. (use
>>> needLines processing direcrive in xml2rfc).  Same applies to the
>>> equation below Table 57 in s4.3.4.3.
>>=20
>> It's not quite clear how to use needLines without undesirable
>> side-effects.
>> Hopefully this is something the RFC editor should be able to handle.
> Indeed.. but I don't know what undesirable side effects there are?
> AFAIK (and in my own usage) it just ensures there are n lines =
available
> on the current page at some point in the text and forces a page throw =
if
> not.
>>=20
>>=20
>>> s4.3.3: (was specified as s 4.3.2.3 whcj was wrong) Paragraph on
>> decoding band boosts:  Might be improved by using
>>> equations rather than the wordy descriptions used at present.
>=20
> Any thoughts on this one
>>>=20
>>> (global)s4.3.2.3, para above table 56: s/iff/if and only if/
>>=20
>> Fixed.
>>=20
>>=20
>>> s4.3.3: <<snip>>.
>>=20
>> Added an explanation of band 17=20
>=20
> I don't think this happened.
>=20
>>=20
>>=20
>>> s6.1: This section is perhaps a little 'triumphalist' for the
>> reference
>>> implementation (this may of course be justified!.  The quality
>> metric is
>>> a 'relative quality metric' and presumably if someone does a
>> *really*
>>> good job of coding, it is entirely possible that the new algorithms
>>> might get better than 100 on the quality score (i.e., provide a
>> better
>>> answer than the reference implementation).
>>=20
>> Conformance with the specification is defined by a faithful
>> reproduction of
>> the specified decoder behavior (see RFC 6569 s4.3). By specifying the
>> decoder, future encoders have the freedom to have improved quality
>> with the
>> confidence that they know what output a conforming decoder will
>> produce.
>>=20
>> The tests in s6.1 are measuring the quality of the decoded signal
>> against the=20
>> reference decoded signal, not against the original audio, so greater
>> than 100%
>> wouldn't be possible or meaningful. The test signals have been
>> specially
>> prepared to thoroughly test a decoder implementation, and they
>> sacrifice encoded
>> quality in order to rapidly exercise the corner cases.
>>=20
> You might want to add this comment to the text.
>=20
> As regards the 100 limit, I was sort of assuming that the quality =
figure
> was derived from improving on the 48dB SNR figure.  Probably a
> misreading.  AS a matter of interest, would one be able to tell from =
the
> tests that a putative new implementation really was 'better' in some
> sense? Or is this now almost a subjective matter that can only be
> determined by extensive listening tests?  I got the impression we may =
be
> converging on the diminishing returns point.
>=20
> /Elwyn
>=20
>>=20
>=20
>=20


From internet-drafts@ietf.org  Thu May 17 09:45:02 2012
Return-Path: <internet-drafts@ietf.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3771921F8674; Thu, 17 May 2012 09:45:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level: 
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SMlQxJbDBkgj; Thu, 17 May 2012 09:45:01 -0700 (PDT)
Received: from ietfa.amsl.com (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 86FAC21F864F; Thu, 17 May 2012 09:45:01 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: internet-drafts@ietf.org
To: i-d-announce@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 4.02
Message-ID: <20120517164501.2106.98162.idtracker@ietfa.amsl.com>
Date: Thu, 17 May 2012 09:45:01 -0700
Cc: codec@ietf.org
Subject: [codec] I-D Action: draft-ietf-codec-opus-14.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 May 2012 16:45:02 -0000
X-List-Received-Date: Thu, 17 May 2012 16:45:02 -0000

A New Internet-Draft is available from the on-line Internet-Drafts director=
ies. This draft is a work item of the Internet Wideband Audio Codec Working=
 Group of the IETF.

	Title           : Definition of the Opus Audio Codec
	Author(s)       : Jean-Marc Valin
                          Koen Vos
                          Timothy B. Terriberry
	Filename        : draft-ietf-codec-opus-14.txt
	Pages           : 331
	Date            : 2012-05-17

   This document defines the Opus interactive speech and audio codec.
   Opus is designed to handle a wide range of interactive audio
   applications, including Voice over IP, videoconferencing, in-game
   chat, and even live, distributed music performances.  It scales from
   low bitrate narrowband speech at 6 kb/s to very high quality stereo
   music at 510 kb/s.  Opus uses both linear prediction (LP) and the
   Modified Discrete Cosine Transform (MDCT) to achieve good compression
   of both speech and music.


A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-codec-opus-14.txt

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

This Internet-Draft can be retrieved at:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-codec-opus-14.txt

The IETF datatracker page for this Internet-Draft is:
https://datatracker.ietf.org/doc/draft-ietf-codec-opus/


From jmvalin@jmvalin.ca  Thu May 17 09:49:14 2012
Return-Path: <jmvalin@jmvalin.ca>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C31921F86A6; Thu, 17 May 2012 09:49:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.299
X-Spam-Level: 
X-Spam-Status: No, score=-2.299 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, J_CHICKENPOX_13=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uM1zOyNLW1da; Thu, 17 May 2012 09:49:13 -0700 (PDT)
Received: from relais.videotron.ca (relais.videotron.ca [24.201.245.36]) by ietfa.amsl.com (Postfix) with ESMTP id 3181621F866B; Thu, 17 May 2012 09:49:13 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/mixed; boundary="Boundary_(ID_9fGhe3E5xtAslVrakW0puQ)"
Received: from [192.168.1.14] ([96.21.20.94]) by VL-VM-MR003.ip.videotron.ca (Oracle Communications Messaging Exchange Server 7u4-22.01 64bit (built Apr 21 2011)) with ESMTP id <0M4600ALWEQ0NZA0@VL-VM-MR003.ip.videotron.ca>; Thu, 17 May 2012 12:49:12 -0400 (EDT)
Message-id: <4FB52BFE.4070606@jmvalin.ca>
Date: Thu, 17 May 2012 12:49:02 -0400
From: Jean-Marc Valin <jmvalin@jmvalin.ca>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
To: Elwyn Davies <elwynd@folly.org.uk>
References: <1337001184.23527.1544.camel@mightyatom.folly.org.uk> <4FB2F5D0.7070701@jmvalin.ca> <1337203600.31554.2213.camel@mightyatom.folly.org.uk>
In-reply-to: <1337203600.31554.2213.camel@mightyatom.folly.org.uk>
X-Enigmail-Version: 1.4.1
Cc: General Area Review Team <gen-art@ietf.org>, "codec@ietf.org" <codec@ietf.org>, IETF discussion <ietf@ietf.org>, draft-ietf-codec-opus.all@tools.ietf.org
Subject: Re: [codec] [Gen-art] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 May 2012 16:49:14 -0000

This is a multi-part message in MIME format.

--Boundary_(ID_9fGhe3E5xtAslVrakW0puQ)
Content-type: text/plain; CHARSET=US-ASCII
Content-transfer-encoding: 7BIT

Hi Elwyn,

We're submitted a -14 draft to address your comments. Again, see the
response to each of these issues in the attached document.

Thanks again,

	Jean-Marc


On 12-05-16 05:26 PM, Elwyn Davies wrote:
> Hi, Jean-Marc.
> 
> ... and thanks for the super-quick response!  You have been quite busy.
> 
> I have had a look through the new draft and I think the additions help
> considerably with comprehension for the naive (and to give new
> implementers a way in.)


--Boundary_(ID_9fGhe3E5xtAslVrakW0puQ)
Content-type: text/plain; CHARSET=US-ASCII; name=genart2.txt
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=genart2.txt

On 12-05-16 05:26 PM, Elwyn Davies wrote:
>> The CELT layer, however, can adapt over a very wide range of rates,
>> and thus has a large number of codebooks sizes
> s/codebooks/codebook/

Fixed.

> s4.3.3, para after Table 57: s?the maximums in bit/sample are
> precomputed?the maximums in bits/sample are precomputed?

Fixed.

> Also suggest:
> s4.3: Add reference for Bark scale: Zwicker, E. (1961), "Subdivision of
> the audible frequency range into critical bands," The Journal of the
> Acoustical Society of America, 33, Feb., 1961.

Done.

>> No DFT is used. The lower band is obtained through resampling (which
>> is already described) and the higher band is obtained by not coding
>> the lower band with CELT (the text says that CELT starts at band 17 in
>> hybrid mode). The explanation was reworded to make this as clear as
>> possible at this point in the text.
> 
> [I thought I had reworded this comment in the 2nd version to talk about
> MDCT but no matter]. 
> Yes, para 5 of s2 does say that the bands are discarded.  I think it
> would useful to have a concrete statement in the new text added to s4.3
> that bands 0 to 16 are discarded in hybrid mode (thereby making the 17
> in the band boost section more obvious) [There is a comment below that
> you have added some text about band 17 in section 4.3 but I can't see
> it].

Sorry, we started working on the revision as soon as you sent the first part
of the review, and then we just copied the new parts (didn't notice some of
the review changed).

Also, the reason you didn't see the new explanations about band 17 in s4.3 is that we moved them to s2 para 5, to help make the explanation of the signal splitting clearer, but forgot to update the response to indicate that (sorry about that). However, it probably does make sense to explain it in both places, so the sentence,
  "In hybrid mode, the first 17 bands (up to 8 kHz) are not coded."
has been added to s4.3 as well.

>> As explained in the LBRR text, a 10 ms frame will only contain 10 ms
>> LBRR data even if the previous frame was 20 ms, so there's 10 ms
>> "missing".
> Indeed - that there would be a hole was clear.  The 'How' referred to
> how would it be concealed.  Having read further by now this may be down
> to Packet Loss Concealment - so maybe all it needs is a foward ref to
> s4.4. 

Reference added.

>> We believe that in the field of audio codecs, the mention of "byte"
>> without
>> further context is well understood to mean 8 bits.
> 
> True. But this is a matter of IETF style.  The style is to use octets
> where we mean 8 bit bytes. I think you now have a mixture!
> 
>>

Indeed, there's a bit of inconsistency here. Considering Cullen's email,
the document now uses "byte" consistently.

>>> s4.2.7.5.1, para 1: s/This indexes an element in a coarse codebook,
>>>     selects the PDFs for the second stage of the VQ/This indexes an
>>>     element in a coarse codebook that selects the PDFs for the
>> second stage
>>>     of the VQ/
>>
>> The text as written is correct. The index I1 is what selects the PDFs
>> for the second stage, not the vector from the coarse codebook in
>> Tables 23 and 24. I.e., it's saying, "This does A, B, and C."
> 
> OK.  I think it might be clearer if the three things were separated out
> as a list.  Now you point it out I can read it correctly but it
> triggered minor confusion - worth turning the three things into bullet
> points.

This is not a bad idea. I agree it helps make things clearer. Done.

> NEW:  s4.3: Add reference for Bark scale: Zwicker, E. (1961),
> "Subdivision of the audible frequency range into critical bands," The
> Journal of the Acoustical Society of America, 33, Feb., 1961.

Done (as stated above).

>>> s4.3.3: (was specified as s 4.3.2.3 whcj was wrong) Paragraph on
>> decoding band boosts:  Might be improved by using
>>> equations rather than the wordy descriptions used at present.
> 
> Any thoughts on this one

Oops, that one slipped through. While most of the text is actually
describing an algorithm rather than an equation, it was possible to simplify
the part about the quanta with an equation. The text now reads:
"For each band from the coding start (0 normally, but 17 in Hybrid mode)
to the coding end (which changes depending on the signaled bandwidth), the boost quanta
in units of 1/8 bit is calculated as: quanta = min(8*N, max(48, N))."

>>> s4.3.3: <<snip>>.
>>
>> Added an explanation of band 17 
> 
> I don't think this happened.

See above -- operator error. It's fixed now.

>> The tests in s6.1 are measuring the quality of the decoded signal
>> against the 
>> reference decoded signal, not against the original audio, so greater
>> than 100%
>> wouldn't be possible or meaningful. The test signals have been
>> specially
>> prepared to thoroughly test a decoder implementation, and they
>> sacrifice encoded
>> quality in order to rapidly exercise the corner cases.
>>
> You might want to add this comment to the text.

Added the comment about 100 being the max. As for the other part, the test
vector section already states that:
"These test vectors were created specifically to exercise all aspects of the
decoder and therefore the audio quality of the decoded output is
significantly lower than what Opus can achieve in normal operation."

> As regards the 100 limit, I was sort of assuming that the quality figure
> was derived from improving on the 48dB SNR figure.  Probably a
> misreading.  AS a matter of interest, would one be able to tell from the
> tests that a putative new implementation really was 'better' in some
> sense? Or is this now almost a subjective matter that can only be
> determined by extensive listening tests?  I got the impression we may be
> converging on the diminishing returns point.

You can't have a "better" decoder because the reference implementation is
*by definition* the best decoder possible. From there, the encoder can
be improved to optimize the quality of a bitstream to be decoded by that
reference decoder. The encoder included with the reference is mature enough
that improvements usually need to be validated with human listening tests;
objective quality measurements aren't quite reliable enough alone to distinguish
'different' from 'better', unless the change is very significant.


--Boundary_(ID_9fGhe3E5xtAslVrakW0puQ)--